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ABSTRACT 


The objective of this research has been the creation of a hardware design for a 
Predictive Read Cache (PRC). The PRC is a developmental cache intended to replace 
second-level caches common in modern microprocessor systems. The PRC has the potential 
of being faster and cheaper than current second-level caches and 1s distinctive 1n its ability to 
predict data addresses to be referenced by a central processing unit. 

Previous research has analyzed the behavior that the PRC must exhibit. During the 
described research, the behavior was modeled in the Vernlog hardware descnption language. 
Verilog-XL was used for simulation, which uses the Verilog behavioral model as input. The 
behavioral model suggests that the internal structure of the PRC could be divided into six 
modules, each performing part of the function of the whole PRC. Each of these blocks was 
studied for hardware equivalents, easing the development of the total structural model. 

Using Verilog structural models as input, Epoch was used to automatically perform a 
very large-scale integrated (VLSJ) circuit layout and to generate timing information. The 
Epoch output files are used for further simulation with Verilog-XL to identify critical parts 


of the design. The result of this research is a complete hardware design for the PRC. 
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I. INTRODUCTION 


A. HISTORY 


Billingsley and Fouts demonstrated the viability of using 
an address predicting buffer to reduce memory latency in 
computer systems. “The implementation of a MPB [Memory 
Prediction Buffer] is less expensive than a next-level cache 
and delivers a comparable performance enhancement.” 
(Billingsley, 1992) 

With this in mind, Nowicki designed a Read Prediction 
Buffer (RPB) as part of his thesis work in 1992 (Nowicki, 
1992). This RPB was capable of prefetching data based on the 
previous pattern of memory accesses. Continuing the work of 
Nowicki, Aguilar tested that design and suggested several 
Gmnencements to improve it (Aguilar, 1995). A tentative 
design of this new Predictive Read Cache (PRC) was a part of 
his thesis work. 

Aguilar proposed a design consisting of six modules which 
together would comprise the PRC. He designed four of those 


Six modules, testing each independently, but not together. 
BB PRINCIPLE OF OPERATION 


The Predictive Read Cache stores data only, not 
instructions. The design is based on a couple of observations 
about data fetches from main memory. First, within a 
specific block of data, the accesses often occur in sequential 


patterns such as every element in order, or every other 


bo 


element in reverse order. The second observation is that a 
program often uses severai blocks of data concurrently. 

The PRC takes advantage of the access patterns to predict 
future memory access addresses. The prediction is based on a 
linear displacement of the addresses. The PRC calculates the 
difference between two given addresses, then adds the 
difference to the most recent address to arrive at the 
predicted address. For example, if the Central Processing 
Unit (CPU) accesses the data at address 20h (hexadecimal 20) 
and then at address 40h, the PRC predicts that the CPU soon 
will need the data at 60h. Once the PRC has predicted an 
address, it fetches the data from that address. Once the data 
1s stored in the PRC, the PRC can deliver that data to the CPU 


much more quickly than the main memory could deliver the data. 


The PRC handles multiple data blocks through its “lines.” 
Each line is capable of tracking the pattern of accesses 
within a unique block of data. Thus, the PRC can track only 
as many access patterns as it has lines. 

When the cache is full and a new access pattern begins, 
a line has to be replaced. Lines that have not been used 
recently become aged. Aged lines are the first to be replaced 
when the cache is full. 

Data incoherency 1s avoided through the process of 
flushing lines. When a line is flushed, that line is marked 
as containing invalid data and is made available for tracking 
new access patterns. If the CPU writes data to an address from 
which the PRC has prefetched data, the PRC flushes the line 
with that data. 


Cc. RESEARCH GOALS 


RiemoObyect lVeworEwents vesearch is to create a complete 
hamaware sdeeagn Of sthe» PRC. Completing the design has 
priority over the performance, though the performance must be 
better than the performance of main memory for this design to 
be of any value. 

The performance 1s measured in terms of the rate at which 
the Central Processing Unit (CPU) can access the data in the 
PRC. In the microprocessor system for which this PRC design 
is created, data accesses occur 1n groups. The groups are 
ealiled™ "bursts." Bach access within a burst is called a 
“beat.” With a 60-ns memory and a 66-MHZ system clock, the 
four-beat burst operation takes 8-3-3-3 cycles, that is, eight 
cycles for the first beat and three more cycles for each of 
the three remaining beats. The design of the PRC must perform 


at least this well and preferably much faster. 


D. THESIS STRUCTURE 


The Testbench is presented first, which is the Verilog 
model of the environment in which the PRC is expected to 
operate. ims mcesecripeion aecludes a summary of the bus 
protocol and results of tests that show the correct 
performance of the Testbench. 

The description of the behavioral model design phase is 
presented next. This chapter presents a simple psuedocode 
model of the PRC which is used to develop an appropriate data 
Spr Wermec “ama biock diagram for the PRC. The individual 


blocks are each modeled with Verilog and then connected 


together in the Testbench to verify that the entire PRC works 
as desired. 

Once the behavioral model design phase is complete, each 
block is converted into a hardware (structural) model. This 
phase of the design is detailed in Chapter IV. 

This thesis also contains a description of the Computer 
Aided Design (CAD) tools used for this research. The 
descriptions include tips for making their use easier and 


descriptions of any problems encountered. 


It. TESTBENCH 


This chapter describes the Testbench, the environment in 
which the Predictive Read Cache (PRC) was designed to operate. 
In particular, 1t summarizes the bus arbitration protocol and 
explains the important aspects of each part of the Testbench. 
The chapter concludes with the test results of the Testbench 
itself. 


A. OVERVIEW OF TESTBENCH 


The Testbench models and simulates the environment in 


which the PRC design was tested. As indicated in Figure 1, it 


comprises four blocks, one of which is the PRC itself. The 
Testbench was developed with Verilog behavioral models. The 
CPU module simulates various functions of a PowerPC-603. The 


Memory module simulates the behavior of a 60-ns dynamic random 
access memory (DRAM). The Arbiter controls access to both the 
address and data busses. Each of these modules is described 
in more detail in the following sections, after a description 


of the PowerPC-603 bus protocol. 


ADDR /3 


—— 


MEMORY | 


| 


Controls 








Figure 1. Block Diagram o£ Testbench. 


There were four major decisions made regarding the design 
of the Testbench. The first decision was to use a PowerPC-603 
microprocessor system as the environment in which this PRC 
will operate. The work of Aguilar was started using the ‘603 
(Aguilar, 1995). It is still a current member of the PowerPC 
family; the protocol should not be out of date for quite some 
time. 

The second design decision was to limit the ‘603 to in- 
order transactions. The ‘603 is capable of performing certain 
sequences of data transfers out of order. That is, the order 
of the data bus cycles can be different from the order of the 
address bus cycles. Prohibiting these transactions made the 
CPU model simpler and simplified the design of the PRC. This 
did not undermine the demonstration of the PRC as a viable 
memory management tool. 

The third design decision was to use a 66-MHZ system bus 


and CPU clock rate. Sixty-six-MHZ is a reasonably fast system 


bus speed. Designing for a slower bus speed could severely 
reduce the applicability of this design to modern systems. 
The fourth decision was to use the 64-bit data bus vice 
the optional 32-bit configuration. When configured with the 
64-bit data bus, the PowerPC-603 can access memory in one of 
two modes: single-beat or four-beat burst. A single beat is 
one memory access of one to eight bytes. A four-beat burst is 
a sequence of four sequential memory accesses, eight bytes per 
beat totaling 32 bytes. When configured with the 32-bit data 
bus, the ‘603 can access memory in one of three modes: single- 
beat (one to four bytes), two-beat burst (eight bytes), or 
eight-beat burst (32 bytes). Data transfers are less 
complicated with the 64-bit data bus since there are fewer 
transfer options and a smaller number of beats. Also, the 
time from one cache miss to the next is independent of the 
Gata bus size. Since a burst transfer on the 32-bit bus takes 
more cycles, there 1s much less time between cache misses for 
the PRC to do its job, perhaps too little time. Further, the 
32-bit mode is specific to the '603; therefore, the PRC would 
have to be redesigned to be used with the other 64-bit bus 
members of the PowerPC family. A disadvantage of the 64-bit 
option is the increased number of pins required for the PRC 


from about 108 to about 140. 


B. SUMMARY OF '603 PROTOCOL 


The PowerPC-603 has separate data and address busses, 
each with independent cycles, referred to as tenures by the 
Motorola engineers. Tenure has three phases: Arbitration, 


haan stomscane Termination . 


~J 


The system has a bus arbitration unit which contyYois the 
passing of bus mastership between the requesting units. In 
this implementation, the CPU and the PRC are the only 
candidates for bus mastership. Module Arbiter is the 
sugOoteiBas\enuojo! Wuclsliol 

When a unit wants the bus, it asserts BR_ (bus request). 
Tf the unit can have the bus next, the arbiter asserts BG_ 
(bus Grant) back te eis iawee Then the wnit walts, “We 
necessary, for the previous master to finish its tenure, after 
which the unit takes mastership by asserting ABB_ (address bus 
busy). When the current master is done with the address bus, 
1t negates ABB_. 

This system has no external cache or multiple processors; 
thus, there are no address-only transactions. If a unit wants 
the address bus, it will also want the data bus. After 
granting the address bus by asserting BG_, the arbiter then 
grants the data bus by asserting DBG_. 

Both BG_ and DBG_ remain asserted until the requesting 
unit takes mastership or withdraws its request by negating 
BR_. If there are no pending bus requests, the arbiter "parks" 
the CPU by granting it the busses. If the CPU is parked, it 
does not have to take the time to request the bus, thereby 
reducing the time for the memory access. If the CPU is parked 
and the PRC requests the bus, the arbiter unparks the CPU and 
grants the bus to the PRC. 


cs TESTBENCH 


The Testbench is the highest level in the design 
hierarchy. It connects the CPU, PRC, memory, and arbitration 
wmat . This module establishes the system clock rate and 


Semenselc the camulation time. 


D. CPU 


The CPU module simulates PowerPC-603 memory accesses. 
The Sequencer is a sub-module of the CPU which makes the 
Testbench able to simulate every transaction relevant to the 
memory and PRC. These transactions can occur in any order. 
Many of the possible ‘603 transactions are not applicable to 
thas particular system configuration. For example, none of 
the “address only" transactions are relevant, since they are 
for systems with multiple processors or second-level caches. 
Bus arbitration 1s accurately modeled, including the pipelined 


address tenures. 


E. MEMORY 


This module emulates the main memory of the system. For 
Simulation efficiency, the memory has only enough physical 
address space for four-beat burst reads: 128 bytes. The 
address bus width allows a virtual address space of four 
Gbytes. Accesses to addresses past the first 128 bytes map to 
addresses within the first 128 bytes. 

The time required for memory accesses are determined by 


the use of the parameters Delayl and Delay2. The heading in 


the file memory.v describes how to adjust these parameters to 
achieve a realistic memory access rate. 

There were two significant decisions made about the main 
memory design. First, the memory emulates a 60-ns DRAM 
memory. With a 60-ns memory and a 66-MHZ system clock, the 
four-beat burst operation takes 8-3-3-3 cycles, that is, eight 
cycles for the first beat and three more cycles for each of 
the three remaining beats. 

The second design decision was to add a cancel feature to 
the main memory chip. The memory module has an input called 
CANX which cancels the current read operation. It is through 
this signal that the PRC stops the memory module from 
delivering data to the CPU when the PRC already has the data. 

Another option would be to put the PRC between the CPU 
and Memory, not allowing a read request to get to the memory 
chip until after the PRC had checked its contents.- This 


scheme would increase the time of all memory accesses. 


Be ARBITER 


The Arbiter emulates the external bus arbitration unit, 
implemented as a Finite State Machine (FSM) corresponding to 
the state diagram in Figure 2. 

The memory unit in this Testbench is capable of handling 
up to two memory accesses in the pipeline at a time, which is 
the maximum that the CPU will ever cause. Adding the PRC to 
the system creates the possibility of three accesses in the 
pipe. For example, the PRC could initiate a third address 
tenure before the first of two CPU transactions is complete. 
This potential problem is handled by the Arbiter which keeps 
track of the pipelining depth. It will mot Grant the addres 


10 


Sia weemeany Unit aeeeman address tenure would put a third 
Pacmieoo@eron mm the pipeline. Rather, the Arbiter will stall 
uptil the data tenure from the first transaction is complete, 
after which the Arbiter will grant the address bus to the 


requesting unit. 


: 


Start 

Grant CPU addr bus 
Park CPU 

Grant CPU data bus 
Grant PRC addr bus 
Wait for PRC 

Grant PRC data bus 


QnAwoOQWw Pp 


in 
[CPU_BR_, PRC BR] 


Outputs 
[CPU_BG ,CPU_DBG , 


PRC BG ,PRC_DBG _) 


Numbers refer to verilog state numbers. 





Figure 2. State diagram for Arbiter FSM. 


G: TEST RESULTS 


Testing the Testbench itself was important to establish 
that the models matched the behavior described in the Power PC 


User’s Manual. The Testbench passed all tests of reads, 


writes and burst operations, in various sequences of 
transactions and using an assortment of memory access delays. 

Figure 3 shows the fastest possible burst operations, as 
if the memory access time were not the limiting factor. Note 
again that the address tenure of the second transaction can 
Start before the data tenure of the first transaction is 


complete. 


en So aeT, 


0 15 ph 1 ee A) 165 ee 


H : ! 
2 H | ‘ 
3 


| 


h. CPU1. clko Stl 


ch. CPU1.BR_o1 


| 


ich. CPUL.BG_o St0 
CPU1.ABB_o Pul 


ich. CPU1.TS_o Hiz 


iS 


sts Mey CS Pee 44444444 0 


‘PUL. TI[0:4] » zz 


| 


i 
; 
i 


h. CPUL.TEST_o Pul 


: at : 
—=— oo = — = - = = — ae ee ee oe 
zs 
¢ ef 
5? 


| 
| 


suis ea ee 


! 
| 
J 


2. Se 


muetmenne: 


h. CPUL. AACK_© Stl 


.CPU1.DBG_» St0 


t i : 

ee ee eee = = - = eee: we ewewee we eee = 2 @ @ 
E ‘ 5 
z Cs i 

eae @ se = = we eS ae heme fe ee ee ee aeww we = = = 
a : 
2 ty r 
j ; : 


2 mer Ry hema eS Gt Ne Sg a 


= eelees & = = 


.CPU1.DBB_» Pul aa 
oa Steve - i 


EX seamen, 


! 
i} 
1 
! 
! 
i 
! 
! 
t 


| 
| 
I 
| 
| 
ad [aan 
| 
| 
| 
| 
1 ’ 
| 
| 
I 
| 
1 
I 
I 
| 
| 
i] 
Ad 
| 
! 
i] 
| 
Th 
| 
| 
1 
al. 
| 
| 


soe 


ch. CPU1L. TA_» Stl ica) 


(eee em 


ae ea = = 
i 


| 


. 
ee ee ee = ee 


h. CPUL. clk Stl 


_ 
a 


. 


A 


—= 
f = 
ee ee ee ty SS oot 
‘ 


? 
ele .— = ae a = = = — = @ = 


i 





i} 
! 
i} 
! 
ey ele ee 
” 
L] 
| 
_t 
~t 


r 
eS e,@ 2 =e = = 
i 


PIgure 3. BUicsteieiies eRe burst read. Delay=0. [cWaves 
output ] 


Figure 4 shows a burst write transaction with an access 
delay of three cycles and a delay of one cycle in between each 


beat. A realistic 60-ns DRAM will have a delay of 8-3-3-3 


lnZ 


The PRC however should be 


rather than the 3-1-1-1 shown here. 


BPolempossupply data this quickly. 
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Lit. PRC BEHAVIORAL MODEL DESIGN PHASE 


This chapter presents the development of the behavioral 
models for the PRC. A simple pseudocode model is presented 
fas This model was used to develop an appropriate data 
Seeuecrure and block diagram for the PRC. The andividual 
blocks in this block diagram were implemented with Verilog 
behavioral modules and tested together to verify the 
behavioral model of the PRC. The next step was to convert 
each module into a hardware model compatible with Epoch, 


detailed in the next chapter. 


A. PSEUDOCODE MODEL 


The behavior of the PRC is explained in detail in the 
paper by Fouts & Billingsley (1994, p.113) and summarized in 
ade Iintroguct1on chapter of this thesis. Another way of 
summarizing this behavior is through a pseudocode model as 
Shown in Figure 5, which is just detailed enough to identify 
the most significant capabilities the PRC must have. The 
purpose of taking this approach was to clarify the function of 
the PRC and to aid in identifying specific behaviors of this 


cache which the hardware needs to exhibit. 


B. DATA STRUCTURE 


A possible data structure for the PRC is shown in Figure 


Ge Poem imne (26 tines within the PRC must contain two 


addresses, some status information and data. The two 
addresses are required to maintain the memory access pattern. 

There are also two seven-bit pointers, each containing a 
value in the range of zero to 127. The ActiveLine pointer 
contains the number of the line that is currently being used 
by the PRC. The ReplaceLine pointer contains the number of 


the next line to be replaced when a new line is needed. 


this 


ioe eee y (On MOREE PSeUDOCODE *** 


// CAR = current address register 
// MRMA = most recent memory address 
// PreaMA = predicted memory address 


always at negative edge of HRESET_ 
clear all status flags; 
put PRC in IDLE state; 
ActiveLine = 0; ReplaceLine = 0; 


a ie) > 
Welt erOon next transact ion 


CASE (transaction) 


Catas bUCSE—read: 
1f CAR hits in PRC, //PRC has requested data 
Switch ActiveLine to line that was hit; 
send data to CPU; 
send cancel signal to memory; 
predict next address; 
if next address is not already in PRC, 
read next address; 
store in ActiveLine; 
update MRMA and PredMA; 


else if CAR misses, //PRC does not have requested data 
Switch ActiveLine to the next ReplaceLine; 
Potdtowmicwrhe ferst miss for this line, 
store this address in MRMA; 
if this is the second miss for this line, 
initiate search for next ReplaceLine; 
predict next address; 
1f next address not already in PRC, 
read next address; 
store in ActiveLine; 
update MRMA and PredMA; 


burst-write, or write: 
aie (CANS) olah ese 
eiliciemat Ching lane; 


Ceaeemiaeocl Or 2nStruGtEion transaction: 
1gnore; 


endcase; 


cloimoy NDI y 


Figure 5. PRC Pseudocode Model 


DATA STRUCTURE 


PredMA (0:26) MRMA (0:26) cove DATA (32 bytes) 


A 
as ci bits 64 bits | 64 bits 
Cal & cf nea 





PredMA = Predicted Memory Address : , 
MRMA = Most Recent Memory Address ActiveLine 


V = Valid 
| ReplaceLine — 


A = Aged 








Figure 6. PRC Data Structure. 


Ce BLOCK DIAGRAM 


The pseudocode model revealed several specific tasks the 
PRC must be able to accomplish. Identifying and clarifying 
these tasks resulted in the development of six blocks within 
the PRC. These blocks are shown in the block diagram of 
Figure 7 and are described briefly here. 

The Snooper watches transactions between the CPU and 
memory, ralSing appropriate signals if the transaction is one 
in which the PRC is interested. 

The Line Manager contains the Address List and Line 
Replacement Unit as sub-blocks. The Address List contains all 
the recently-accessed memory addresses and all the predicted 
addresses. The Line Replacement Unit determines which of the 
128 lines will be replaced the next time a new line is needed. 


These two blocks are grouped together because they share 
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status information about the iines and work closely together 
for line management. 

The Predictor module uses its two input addresses to 
Seecice ICSWemeput address. 

The Data List stores 128 lines of data, 32 bytes in each 
line, which is the amount of data in each burst read or burst 
peace . 

The Bus Interface handles the protocol of data transfers 
iimeeeo and Ou, OL the PRC. 

Dict ehiewGcommmalten coordinates the actions of all 
Eaewother functionel blocks to accomplish the mission of the 


ERC . 


eS, 


PRC BOCK DIAGER: 
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| CAR = Current Addr Register | 
| NAR = Next Addr Register 
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Figure 7. PRC Block Diagram. 


Controls 
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D. CONTROLLER 


This module is a Finite State Machine which coordinates 
the actions of all the other functional blocks of the PRC. 
All control signals are synchronous with the system clock. 
HRESET_ causes the Controller to go to the IDLE state. The 
state dlagram and state output tables are shown in Figures 8 
and 9. 


Controller State Output Table 
test store 

STATE a_ select predict 

EDLE x 

TEST_CAR(R) CAR 

SEND_DATA X 

TEST_NAR NAR 


new replace 
hod fetch 
Q 


Fh 

k 

CG 

0) 

a 
2) 
1) 
a 
On 


FETCH_DATA NAR 
ion NE EMPTY 
PREDICT_NA 
SsTORE_CAR 
TEST_CAR (W) 

BibUSH LINE 


OrFPOCDOOFROFO 
SOoOroooro6G 
Sop oqoooooons 
Om OOO Ol oe 
Coo O OD OF oe 
PRERPREREOOOFO 
Seat OO © oo © 
O@Q.o © Ol ooo 





Figure 8. Controller State Output Table. 
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Controller State Diagram 


write=1 TEST 





Figure 9. Controller State weiagram. 
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LA SNOOPER 


This module watches the system bus activity and makes 
Doo meemilaieeomaepOtecmbenmehne PRC Controller. 

If the transaction is a data burst read or any kind of 
write and if the address parity 1S correct, then two actions 
alee cham Basaciee, read or write 1s asserted as appropriate. 
Second, the address is placed in the Current Address Register 
(CAR). The snoop_ignore signal tells this unit to ignore the 
current transaction, because it was initiated by the Bus 
Interface Unit. The snoop_ignore signal must be asserted 
concurrently with the transfer attributes. 

Reads that are not burst reads or data related are 
ignored by the PRC. The CAR is updated only on transactions 
relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC 
with respect to memory accesses, a second address tenure can 
occur shortly after the first, well before the first data 
tenure 1s complete. To compensate for this, the read and 
write outputs of the Snooper remain exerted until acknowledged 
Beene COmeno | lemwith shold. The rising edge of hold 
indicates that the read or write signal was received by the 
Sontroller. The Snooper then can negate these signals but 
must leave CAR alone until hold is negated. After hold is 


negated, CAR can be updated to the new address. 


Lp 


es LINE MANAGER 


This module contains the address list, status flags for 
each line (Valid, Aged), a general status flag (line_empty), 
the line replacement unit, and a couple of pointers 
(ActiveLine, ReplaceLine). On HRESET_, Valid=0 (all lines), 
Aged=0 (all lines), line_empty=1, ActiveLine=0. 

The MRMA output is always the MRMA of the ActiveLine. 
The Jline_empty flag indicates that the currently active line 
has no addresses in it yet; therefore, the addresses cannot be 
used by the PRC to make a prediction. 

The input a_select determines which address input is used 
for a particular operation. The two address inputs are the CAR 
and the NAR. 

When the Line Manager receives a test signal, it compares 
the input address with the contents of the PredMA List. If 
there is a match with the CAR, it asserts the hit signal ang 
changes the ActiveLine pointer to the line number of the hit. 

If there 1s amiss with the CAR, then the ActiveLine 
Switches to the same line to which ReplaceLine points. 

If, during a test, there is a match with the NAR, two 
actions occur. First, hié is asserted. Second, ene valuewam 
ActiveLine becomes irrelevant since it will not be used. If 
there 1S amiss with the NAR, the ActiveLine must remain 
unchanged from the test. 

The fetch_done signal from the Bus Interface causes the 
NAR to be stored in PredMA[ActiveLine], the CAR to be stored 
in MRMA[ActiveLine], the Valid flag to be set, and the Aged 


flag to be reset. 
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The flush signal causes the current ActiveLine to become 
invalid by setting Valid[ActiveLine] = 0. 

The store Signal causes the input address to be stored 
into the MRMA of the ActiveLine. This is only used for the 
first address in anew line. The store signal also causes the 


jine_empty flag to be reset. 


Line replacement: ReplaceLine always points to the line 
to be replaced at the next PRC miss. HRESET_ causes this to 
be zero. 

As soon as the PRC starts predicting the first address 
for a line it asserts new_replace. The replacement unit then 
finds a new line to mark as the next ReplaceLine according the 


following procedure. 


Done=false; 
repeat 
ReplaceLine = ReplaceLine + 1; (mod 128 addition) 
1f not (Valid[ReplaceLinej ) 
Done=true; 
elseif (all_line_are valid AND Aged[ReplaceLine]) then 
Bone = true; 
else 
Aged[ReplaceLine] = 1; 
until Done; 


ieimierempt y= > 


In words, the Line Replacement Unit searches sequentially 
for the next line with invalid data and marks that line as the 
next line to be replaced. If ail lines contain valid data, 


then it scans for the next line that is "aged,” indicated by 


ae 


a set Aged flag. As it scans for an aged line, it sets the 
Aged bits in the “unaged” lines it passes. Therefore, as it 
wraps around in the search for an aged line, it will 
eventually come upon one, even if none were aged when the 
search began. 

All of this occurs while the PRC is fetching data. 
Therefore, the PRC has several clock periods in which to 


complete the search. 
G. PREDICTOR 


The Predictor module has two address inputs, the Most 
Recent Memory Address (MRMA) and the Current Address (stored 
in the Current Address Register, CAR). It has a single 
output, the Next Address which 1s stored in the Next Address 
Register, NAR. 

This module calculates the Next Address based on the Most 
Recent Memory Address and the Current Address. The rising 
edge of predict initiates the prediction calculation. The 


original equation is 
NAR = CAR + (CAR - MRMA) 
which is implemented as 
NAR = 2*CAR - MRMA. 


The output NAR remains latched and valid until next 


predict leading edge. 
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7 ae DATA LIST 


The inputs to the Data List are upload, download and 
ActiveLine. The 256-bit bus data_line is an input and output. 
An upload signal causes the Data List to store the data 
on data_line into the address specified by ActiveLine. A 
download signal causes the Data List to assert onto data_line 


the data in the address specified by ActiveLine. 
i BUS INTERFACE UNIT 


This module handles the protocol of data transfers in to 
and out of the PRC, coordinating these activities through the 
use of a Finite State Machine. 

When this module receives a fetch signal, it latches the 
address in the NAR and requests the bus for a burst read. It 
stores the incoming data until all four bursts have been 
received. Then, it uploads the data into the Data List and 
asserts fetch_complete. 

When this module receives a send signal, it sends a 
cancel signal (CANX) to the memory module, downloads data from 
Maes Pataelbict ana then sends the data to the CPU. When the 


transfer is finished, it asserts send_done. 
J. PREDICTION TESTS 

There are two large-scale tests included in this thesis. 
The first is the Prediction Test. The second is the Line 


Replacement Test. Together, these tests are sufficient to 


demonstrate that the behavioral model functions as desired. 


ou 


Once the behavioral model of the PRC passed these tests, it 
was ready for conversion to a hardware model. 

The tests are both conducted by connecting the behavioral 
model of the PRC to the Testbench described in the previous 
chapter and running a simulation with a sequence of events. 
The sequence of events for the Prediction Test is included in 
the seguencer4.v file. The sequence of events for the Line 
Replacement Test 1s located in the segquencer5.v file. The 
following procedure lists the steps necessary to conduct a 


test: 


1. Change directories (cd) to the ...verilog/bekhavios. 
Gireeu ema 


2. Modify the file verilog_arguments so that it contains 
sequencer4.v or seguencer5.v as desired and all the 
parts to the PRC and to the Testbench. 


3. Modify the file testbench.v to set the simulation 
duration as described in the heading of the desired 
sequencer. Modify the trace flags in every file 
listed in verilog_arguments as described in the 
sequencer file. 


4. At the Unix command prompt, enter the command verilog 
-f£ verilog_arguments. 


The Verilog-XL outputs of both tests are included in the 
appendices. Together, these tests show that this behavioral 
model performs all the desired functions. 

The Prediction Test, using Sequencer4, causes a series of 
CPU transactions that tests the ability of the PRC to make the 
prediction calculation and to fetch the data. The 


transactions are as follows: 


Burst_read at 00h: The PRC stores this address. 
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bunSstemeaad at Z0nemmerme PRC should predict a next 
address of 40h and then fetch the 
data from that address. 

burst _read at 180h: The PRC should store this address in 
a new line. 

burst_read at 1A0h: The PRC should predict a next 
address of 1C0h and then fetch the 
data from that address. 

burst_read at 40h: This data is already in the PRC, so 
the PRC should send it to the CPU 
and then fetch data from 60h. 

Pimscenviteo me leOn:= Thas data is in the "PRC, so this 
line should be flushed. 

burst _read at 60h: The PRC should deliver this data to 
the CPU and then fetch the data at 
eave 

burst_read at 100h: The PRC should start a new line and 


Shore Enis wadaress. 


This test successfully demonstrates a majority of the 
Capabilities of the PRC, showing when the Line Manager selects 
new lines, when and how the Predictor functions, and when the 
CPU starts a read or write and the data involved. The test 
shows when the Bus Interface Unit fetched data from memory. 
The Data List reported the flow of data in and out of itself. 

The only significant behavior not exercised by this test 
1s the function of the Line Replacement Unit when the PRC is 
full. That is handled with Sequencer5 in the Line Replacement 
Testu. 

The Line Replacement Test was accomplished by a series of 


CPU transactions that quickly fill the PRC. The test shows 


ie) 


that the Line Replacement Unit correctly selected invalid 
lines to be replaced first. When all the lines in the PRC 
contained valid data, the Line Replacement Unit executed the 
algorithm described in the section on the Line Replacement 


Una 


Ke CONCLUSION 


At this point in the development of the PRC, the 
behavioral model was functioning properly. Therefore, it 
could be converted piece by piece into a hardware model. This 
was accomplished using the subset of Verilog understood by 


Epoch, as described in the next chapter. 
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IV: PRC STRUCTURAL MODEL DESIGN PHASE 


This chapter presents the development of the hardware 
model of the PRC. In this phase of the design process, each 
of the behavioral blocks developed in the previous phase was 
implemented with hardware. Converting the blocks in order of 
increasing complexity proved to work out well, making it 
easier to concentrate first on learning how to use Epoch. 

Like the behavioral models, the hardware (structural) 
models are Verilog files. Epoch uses these Verilog files to 
@eeate VLSI layouts. From those layouts, Epoch calculates 
timing information and generates new VerilogOut files with 
Geese ciming information. As each block 1s converted into 
hardware, the new VerilogOut model can replace the original 
behavioral model in the Testbench for testing with Verilog-XL. 
The following hardware blocks result from using this 
procedure. 

Fach section of this chapter also includes a figure 
displaying some important geometric information about the 
module, including surface area and transistor count. This 
information can be aiieaauaed from Epoch with the shell command 


geostat ~-trancount <module name>. 
A. PRC 

The top level module is only a connection of each of the 
modules described in the following sections. The geostat 


PMiitOmicianenmmus shown im Figure 10. Of particular significance 


are the transistor count and the total chip area. 


Sik 


Bounding Box: 
9080.748 x 11278.224 microns, 102414707. 226Nsquare mreronce 
357.510 x 444.025 matey ae 743 2009 squaveuiemeee 


Number of Pins = 316. 

Number of unique cells = 6&6. 
Number of Datapaths = 1 
Number of Sub-Glues = 5 

Total Number of Instances = 6 


Total number of nets = 498. 
Total metall layer route length 
Total metal2 layer route length 699802 .75 maerons: 
Total metal3 layer route length 0.00 miczerns= 
Total route length ="2e7000tee4 nierom-. 
Total number of vias = 9 2460- 
Total number of segments = 16989. 

Reading transistor view 
Total number of 454310 transistors. 
0.349 Square mils per Transistor. 
2.862 Transistors per sG@uace mae 

Power Dissipation = 4742486.500 micro-watts. 


ZIZ0Z S426 microns. 


Hou ou 





Figure 10. PRC Geostat Information. [Epoch output] 


Be CONTROLLER 


This module is a Finite State Machine which coordinates 
the actions of all the other fEunctional blocks of the Pia 
All control signals are synchronous with the system clock. 
HRESET_ causes the Controller to go to the IDLE state. The 
revised state output table (Figure 11) and the revised state 
diagram (Figure 12) give more details. 

Of significance are the wait states added to the state 
diagram of the behavioral model. These changes are boldface 
in the Revised Controller State Output Table. The changes 
were required by the Line Manager in which there is a 
Significant propagation delay for the addresses. This delay 


1s described in more detail in the Line Manager section of 


a2 


Shas Chapter and 1S a prime candidate for future work to 
improve this design of the PRC. The geostat information is 


ellown an Figure 13. 


Controller State Output Table 

test store send new replace 
STATE a select predict sh held fetch 
mOL CAR 0 Q 
WAIT A 
WAIT _B 
WAIT C 
WAIT _D 
WAIT E 
WAIT F 
TEST_CAR(R) 
SEND_DATA 
TEST_NAR 
FETCH DATA 
IS_LINE_EMPTY 
PREDICT_NA 
WAIT_G 
WAIT H 
WAIT I 
STORE CAR 
TEST_CAR(W) 
FLUSH_LINE 
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Figure il. Revised Controller State Output Table. Changes 
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Figure 12. Revised Controller State Diagram. 
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Bouncanc ssosx- 
Noe eewe 2 Sameteeenrerons, 5/77/7/3.825 square microns. 
ime xk So SUoemetc, §=89.550 square mils. 


Number of Pins = 26. 

Number of unique cells = 18. 
Number of Standard cells = 60 
Total Number of Instances = 60 


Mecal number of nets = 71. 
Total metall layer route length 07 3 eee tne ae wis « 
Total metal2 layer route length Vos 46 mie cons. 
Total metal3 layer route length (200. microns . 
ioe meweeamengel = 4146.60 microns. 
let aMmimMber Or Wilas = 226. 
Total number of segments = 1074. 
Reading transistor view 
Total number of 460 transistors. 
teioseoguare mils per Transistor. 
5.137 Transistors per square mil. 
Power Dissipation = 3665.888 micro-watts. 





Beteiwes 13." Controller Geostat Information. [Epoch output] 


Cc. SNOOPER 


This module watches the system bus activity and makes 
appropriate reports to the PRC Controller. 

If the transaction 1S a data-burst read or any kind of 
write and if the address parity 1s correct, then the read or 
write signal is asserted as appropriate. Also, the address is 
placed in the CAR. The snoop_ignore signal tells this unit to 
ignore the current transaction, because it was initiated by 
the Bus Interface Unit. The snoop_ignore signal must be 


asserted concurrently with the transfer attributes. Reads 
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that are not burst or data related are ignored by the PRC. 
The CAR 1s updated only on transactions relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC 
with respect to memory accesses, a second address tenure can 
occur shortly after the first, well before the first data 
tenure 1S complete. To compensate for this, the read and 
write outputs of the Snooper remain asserted until 
acknowledged by the Controller with hold. The rising edge of 
hold indicates that the read or write signal was received by 
the Controller. The Snooper then can negate these signals, 
but must leave CAR alone until hold is negated. After hold is 
negated, CAR can be updated to the new address. 

In Stage 0, the transfer attributes are latched in 
registers. Combinational logic determines if these transfer 
attributes represent a valid read or a valid write and if the 
address parity 1S correct. If the transaction is valid and 
one in which the PRC 1s interested, then Stage 0 raises a 
transaction_waiting signal. 

A Finite State Machine in Stage One sits in the IDLE 
state until it receives the transaction_waiting signal. Then 
it latches the signals needed from Stage 0, resets the 
transaction_waiting signal and then waits for the hold signal 
to go low. A high hold signal indicates that the PRC is not 
done with the previous transaction. Once hold goes low, the 
read and write flags are set according to the type of the 
current transaction. Also, the input address is stored in the 
Current Address Register. The FSM then waits for the rising 
edge of hold before returning to the IDLE state where it can 
check if there is another transaction waiting. The geostat 


information is shown in Figure 14. 
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Bounding Box: 
oe ea Ate State rons, 248793 .127 squaré microns. 
ere Geis, 3S5.630 square mils. 


Number of Pins = 88. 

Numer on wumemenemocellic = 19. 
Number of Standard cells = 169 
Total Number of Instances = 169 


Total number of nets = 219. 
Total metall layer route length = 28547.10 microns. 
MeeciemetalZ ayer rouce length = 14615.39 microns. 
Tecalemetals Mayer route length = 0.00 microns. 
ieoecimereure Length = 43162.49 microns. 
fi@ealemumoer Gf Vias = 464. 
Total number of segments = 2268. 
Reading transistor view 
Total number of 3608 transistors. 
0.107 Square mils per Transistor. 
9.356 Transistors per square mil. 
Power Dissipation = 26722.156 micro-watts. 





Figure 14. Snooper Geostat Information. [Epoch output] 


1D) LINE MANAGER 


This structural model uses a high speed RAM (hsram) for 
the MRMA List. The CAR is stored into this RAM on a store or 
fetch_done signal. 

The predicted_ma_list 1s a register file for storing 
predicted memory addresses. This list is composed of 128 
address registers, 128 equality comparators and 128 Valid 
Status flags. The NAR is stored in this list at the 
fetch_done pulse. If there is a match with the input address 
(in_addr), a priority encoder (ENC_C) determines which line 


matches. 
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The Line Replacement Unit determines the next line to be 
replaced whenever the PRC needs to start a new line. It first 
selects invalid lines. If all the lines are valid, then it 
selects lines that have been “aged.” A priority encoder 
(ENC_1) chooses the line with the lowest index among all the 
lines that can be replaced. Tf all lines are valid, the 
output enable (oe) signal of the encoder is used to cause 
aging. A line X can be replaced if the following holds true 
for that line: 

not (X=ActiveLine) AND {not Valid[X] OR (all_lines_ valid 
AND Aged[X]) } 


Aging is accomplished by the use of a seven-bit counter 
(ager_counter), initially set to zero. When the cause_aging 
Signal from the encoder is high, the counter advances. A 
decoder (DEC_B) output causes the appropriate Aged flag to be 
See 

Changing values of the CAR or NAR have a propagation 
delay of 25 ns (1.8 cycles) through the anput addreae 
multiplexer (in_addr mux). This required the addition of walt 
States in the Controller before each of the tests. The 
Revised Controller State Output Table and the Revised 
Controller State Diagram found in the Controller section of 
this chapter show the required changes. The geostat 


information is shown in Figure 15. 
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Bounding Box: 
6704.064 x 8897.364 microns, 
Poot 350.2903 mils, 


59648499.103 square microns. 


92455.359 square mils. 


hho = 210 ae 
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et Blocks = I! 
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Total number of nets = 357. 
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metall layer route length 
metal2 layer route length 
metal3 layer route length 


DOL 26.50 muexzons . 
A657 6 5s 20 “Micon. . 
OrO0 Smale rons. 


Totalereute lemogrem = 1487012.19 microns. 
LopaleaAvnestsor vias = 2157. 
Total number of segments = 10524. 
Reading transistor view ... 
Metwealemimoeca ot 207467 transistors. 
0.446 Square mils per Transistor. 
2.244 Transistors per square mil. 


Power Dissipation = 1777694.500 micro-watts. 





Figure 15. Line Manager Geostat Information. [Epoch output] 


E. PREDICTOR 

The purpose of this module is to calculate the Next 
Address (stored in NAR) 
(MRMA) 


based on the Most Recent Memory Access 


and the Current Address (in the CAR). The prediction 


eomeulaticn is 
MRMA 


NAR w—- 2° CAR. = 


In this structural implementation of the Predictor, the 
predict signal is the latch for the CAR and MRMA registers. 
The subtraction is accomplished as a two's compliment addition 


with a high speed adder. 
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The CAR is multiplied by two, an arithmetic shift left of 
one bit. The most significant bit of the CAR is not retained, 
as it will not have an effect on the 27-bit output of the 
adder. This will adversely affect address prediction only 
around the midpoint of the four gigabytes of memory. The 
applicable Golden Rule of computer design “is to make the 
common case fast: In making a design tradeoff, favor the 
frequent case over the infrequent case.” (Hennessy, 1990) 

A number is negated in two's compliment by inverting all 
the bits and adding ‘1'. The MRMA is negated by inverting all 
ies brits. Adding the required ‘1' is implemented as a 
Carry-In to the adder. 

The Epoch TACTIC tool reported the propagation delay from 
predict to NAR to be 4.90 ns. The geostat information is 


shown in Figure 16. 
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Peunoatmc Box: 
Cees x eer c7 ee mrenems, 254616.293 square microns. 
(ieee ee Coe oommrs, 365.656 square mils. 


Number of Pins = 113. 

Number of unique cells = 10. 
Number Of Brocks = 107 

Total Number of Instances = 107 


Total number of nets = 230. | 
Repco eciiela eis tOoulLe bength = 12158.68 microns. 


Total metal2 layer route length = 15209.06 microns. 
Total metal3 layer route length = 0.00 microns. 
Heoealvreuse Lengti =| 27367.74 microns. 
ioreleumuamoaes Of: Vilas = 392. 
MOcalemlinocr Or segments = 1793. 
Reading transistor view 
Total number of 3027 transistors. 
0.120 Square mils per Transistor. 
8.324 Transistors per square mil. 
Baver Dasscipation = 27722.887 micro-watts. 





Figure 16. Predictor Geostat Information. [Epoch output] 


es DATA LIST 


This module stores the data retrieved from memory in 
anticipation of a request by the CPU. The basic memory cell 
1s the Bpoch part Asramoe (high speed ram with output enable). 
Since each Asram has a maximum word size of 128 bits, there 
are two hsram parts in parallel to get the required 256-bit 
width. 

An upload signal causes the Data List to store the data 
on data_line into the address specified by ActiveLine. The 
input upload has to be inverted to match the active-low WR 


input of the Epoch hsram component. A download signal causes 
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the Data List to assert onto data_line the data in the address 
specified by ActiveLine. This signal also has to be inverted 
for the same reason. 

Both the invertors can probably be removed if the Bus 
Interface Unit makes the upload and download signals active 
low. That could only improve the response time of the data 
memory . 


Epoch calculated the following timing delays: 


download -> hsramoe.DOUT 2.3 ns 


ActiveLine => hSramoe 7 DOU emer. 


A design alternative is to use the regular speed version, 


ramoe, which gives the following timing delays: 


Gownload -> ramoe.DOUT 4 ns 


ActiveLine -> ramoe.bDOUT TG ns 


Using this slower RAM is possible, but would require a 
Significant modification to the PRC behavior to handle the 
longer delay and would add a cycie delay to CPU reads when 
there is a hit in the PRC. 

Putting the VerilogOut file of this module into the 
original PRC behavioral model for mixed-mode simulation caused 
a timing error that had to be corrected in the Bus Interface 
Unit behavioral model. After an upload to the Data List, 
data_line must remain valid long enough to meet the data hold 
time requirement of the Epoch part hsramoe. The geostat 


information is shown in Figure 17. 


A2 


Bounding Box: 
Bea yee ee eo oommicrens, 12359289.299 square microns. 
eo Gere leo edoremts, £9156.938 square mils. 


Number of Pins = 282. 
Number of unique cells = 3. 
Number of Standard cells = 
Numoer ofl Bilegkes— 92 
Tota wNumber sory instances = 4 


Z 


Teta mumoer Of mets = 269. 
Total metall layer route length US3305. 54 “mierons. 
Total metal2 layer route length B2952.76° Mrerous. 
Total metal3 layer route length = 0.00 microns. 
TOtalweoute |engemy—5251758.30 microns. 
ieeal NnuMmMser Or vias = 7/28. 
Total number of segments = 2422. 
Reading transistor view 
Total number of 214712 transistors. 
0.089 Square mils per Transistor. 
Diese tTransisteors per square mil: 
Power Dissipation = 2181481.250 micro-watts. 


Figure 17. Data List Geostat Information. [Epoch output] 


Ge BUS INTERFACE 


This module connects the PRC with the system bus. ite 
handles the protocol of data transfer in and out of the PRC. 

When this module receives a fetch signal, it latches the 
address in the NAR and requests the bus for a burst read. It 
stores the incoming data until ail four bursts have been 
received. Then it uploads the data into the Data List and 
asserts fetch_done. If there is a parity error during the 
fetch, the Bus Interface informs the Controller by asserting 


fetch_abort. Also, the transaction is canceled. 
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When this module receives a send signal, it sends a 
cancel signal (CANX) to the memory module, downloads data from 
the Data List and then sends the data to the CPU. When the 
transfer is finished, lt "asserts send vdeme: 

The coordination of these activities is accomplished 
through the use of two Finite State Machines. One acts as an 
address bus master. The other controls the flow of data. The 


geostat information 1s shown in Figure 18. 


Bounding Box: -6264, -6408 eZ 21604 jee 3 /29c0e 
2252.304 x 1979.388 microns, 44581888285 saquere muenrons. 
88.673 x 77.929 mils, 6UmeeloS scouar= mele. 


Number of Pins = 448. 

Number of unique cells = 56. 
Number of Standard cells = 1393 
Number of Sub-Glues = 1 

Total Number of Instances = 1394 


Total number of Nets =sle2s- 
Total metall layer route length 676479.94 microns. 
Total metal2 layer route length 469079.94 microns. 
Total metal3 layer route length 02 00 maerens= 
Total route length = 145559737 vmderenc. 
Total number of vias = 9679. 
Total number of segments = 44298. 
Reading transistor view 
Total number of 24403 transiseoncs. 
0.283 Square mils per Transistor. 
3.531 Transistors per square mil. 
Power Dissipation = 237269.750 micro-watts. 





Figure 18. Bus Interface Geostat Information. [Epoch 
OUBSUt | 


5 TESTING 


The most significant large-scale test of the structural 
model is the Prediction Test, which 1S Wsimilar to eee 


Prediction Test of the behavicral moa The test runs Ee 
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Same series of CPU transactions to exercise all functional 
blocks of the PRC. The sequence of events for the Prediction 
Test 1s included in the sequencer4.v file. 


The following steps are required to conduct a test: 


1. Change directories (cd) to the ...verilog/hardware/ 
Qimeeeton ~~ Olmeme Comouter Center (CC) system. 


2. At the Unix command prompt, enter the command 
verilog -f verilog_arguments. 


Mie ver! og @MmrollLput Of the test is included in the 
appendices. This test shows that the structural model of the 
PRC performs the desired functions. The Omeput ~oL the 
Seruceural model test is ditferent From the output of the 
behavioral model test mainly because the new structural model 
does not contain the same display commands. These commands 
interfere with the Epoch compilation of the modules. Other 
display commands were added to the Testbench, which is still 
Bwoene doralemodel. The displays are sufficient to show that 
PRC performs as desired. 

While compiling the source files, Verilog-XL reports four 
warnings about implicit wires having no fanin. These wires 
are labeled NCO and NCl, deriving their initials from “not 
connected.” They are unused outputs on a couple of Epoch 
parts. Therefore, these warnings can be ignored. 

The section with comments about SDF Annotation is the 
Ect MiOmimieceiqnOoraramg the Epoch timing analysis into the 
Verilog model. Once that annotation is complete, the actual 
Simulation begins. 

The error messages at the beginning of the simulation can 


be ignored. These error messages are generated by Epoch parts 
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and indicate improper Signal values or timing. All these 
errors occur before the system hard reset and are expected. 
Having those errors after the system hard reset would have 
indicated a real problem. 

Once the system has reset, the CPU starts its series of 
transactions, beginning with reads from addresses 00h and 20h. 
The comment ”PRC requested the bus” indicates that the PRC is 
prefetching data. It appears that the prefetch occurs before 
the start of the second CPU transaction, but in reality it 
occurs just after the second CPU address tenure, which 1s not 
shown in the report. Also not shown because of the limitation 
of display commands with the PRC is the data prefetched by the 
Re That the data 1S correct can be seen later in the 
report, when the PRC sends the data to the CPU. 

During the CPU to Memory transactions, there is 60 ns 
between each of the four beats of data. When the CPU reads 
from address 40h, the speed advantage of the PRC is 
demonstrated. Note that there 1s now only 15 ns between each 
beat. That is the period of the system clock and is therefore 
the maximum possible rate the CPU can receive data. 

The write to address 1C0h occurred after the PRC had 
prefetched that data. The PRC should have flushed the 
prefetched data, because 1t was no longer valid. Later, when 
the CPU performs a read from the same address, it can be seen 
from the read data and from the timing (60 ns per beat) that 
the CPU 1s getting the data from main memory. In accordance 
with its design, the PRC did not try to give the stale data to 
eto ele 
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we CAD TOOLS 


The three primary design tools used in the development of 
this PRC were Verilog-XL, cWaves and Epoch. Tate Chapter 
describes some of the particularly useful features of these 


tools and gives some tips for using these tools together. 
Bs VERILOG-AL 


Verilog-XL allows the modeling of circuits in a 
programming language. Circuits can be modeled by behavior or 
structure. For the complex design of the PRC, it was 
convenient to start by dividing the design into six blocks and 
then using Verilog to model the behavior of each block. This 
allowed clarification of the required behaviors, deferring the 
search for hardware solutions until after the desired 
behaviors were well defined. 

Currently, Verilog-XL is available only on the Computer 
Center (CC) network. The following steps make it easier to 
use from an Electrical and Computer Engineering (ECE) 


Woemkstatilon: 


1. Add the following line to the .cshrc file in the ECE 
HeeeWie-madltas nee ‘XMOost 1n50204.cc.nps.navy.mil; 
rlogin -l <username> 1n50204.cc.nps.navy.mil'. 


2. Re-source the session by typing “sc <return>". 
Pee Tree —return>" to clog imto the CC account. 


Deaefigemence LoOllowitmog linesto the .cshrc file in the CC 
account: alias remote3 ‘setenv DISPLAY 


4‘7 


sun3.ece.nps.navyget=O0s0* The .cSime rile cana 
contain similar lines for other werkstac cess. 


5. Re-source as in Step 2. 


Now the ECE workstation becomes the display for the CC 
workstation. Typing "filemgr &" will call up the CC fia 
manager. 

Typing "verilog <return>" should give a list Of optima. 
for use with Verilog-XL and will verify access to the program. 
One particularly useful option is to put all the arguments in 
a file, such as verilog_arguments and put the following line 


in the CC .cshre. faiee 
alias veri ‘'verilog -f verilog_arguments ' 


Typing "veri" is much easier than listing the names of all the 
files that need to be included in the simulation. 

The Cadence online documentation can be accessed with the 
command “openbook &”. The Main Menu is the starting point. 
The Alphabetical List on the bottom is the easiest way to find 
the desired information. In this list there is a Verilog-XL 
section which contains hyperlinks to the Verilog-XL Reference 


Manual and Tutorial. 


Be CWAVES 


This tool 1S indispensable for the analysis 
complicated circuits. There is nothing like Seeing a timamee 
diagram to track down design errors: 

The database for the cWaves Viewer is created while 


running the Verilog simulation. The highest level Verilog 
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Tero meniourcd have wrmemroLlowing two lines in an "initial" 


lock : 


Sshm_open,; 
Sshm_probe (<name>, "AS") ; 


where <name> iS the instance name of the module to be 
observed. More information about these $shm commands can be 
found in the cWaves Reference Manual, which is a little 
Saeeracult to find. It 1s in the Cadence Online Library 
accessed with "“openbook & <return>". Once the Main Menu 
appears, select the Alphabetical List on the bottom. The 
cWaves Reference Manual is filed under Composer (Schematic 
Entry), Design Framework II. Secelenl- 4 oF this manual 12s 


particularly useful. 


e- EPOCH 


A circuit designer would find it very convenient if Epoch 
would take as input the raw behavioral models, but it does 
Teele: « Each behavioral block must be converted into a 
structural model. Then, Epoch can automatically generate a 
enue scealle Integrated (VLSI) Circuit layout using a rule. 
set from a specific manufacturer. Evomcene Payout, Epoch 
performs a timing analysis of tne circuit and generates a new 
Verilog file, which includes the timing information. This new 
file then can replace the behavioral model for resimulation 
with Verilog-XL. This allows the designer to verify each 
block as it is designed. CWaves can be used to track down 


Eiimnme ser corms . 
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Epoch is available on the ECE system. To access Epoch, 
add "/tools3/epoch/bin" to the "set path" command in the 
.cshre file. Also, add “setenv CASCADE /tools3/epoch”. 

The Epoch User's Tutorial and the Epoch Verilog Interface 
Reference are both very useful. The former 1s located at 
/tools3/epoch/data/examples/tutorial. The latter can be 


accessed through pull-down menus in Epoch: 
Help => On-Line Manual... 


Sometimes calling up this manual causes a FrameViewer error, 
but the manual does come up after a slight delay. 

The VerilogOut option proved very useful in the 
development of the PRC. With this option, Epoch creates a new 
Verilog file after laying out a design. The new model can be 
inserted in place of the old behavioral model for simulation 
with Verilog-XL. The Verilog Interface reference describes 
how this is done. In addition to the procedures described 


there, it will be necessary to take a few extra steps. 


1. If the files must be moved from the vout directory to 
another directory for simulation with Verilegq—- am 
correct the $sdf_annotate path in the .v file. 


2. In all the behavioral files, add a ‘timescale 
directive like the one in the .v file generated by 
BPOCh This must appear before the "module" 
Statement. 


3. It may be necessary to copy primelib.v from 
/tools3/epoch/data/veriiog into the CG directo. 


The PowerPC uses bit zero as the most significant bit of 
buses, so it was convenient to follow that convention in this 


PRC design. For example, the PowerPC address bus 1s 
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designated A[0:31]. Unfortunately, this causes a problem with 
the VerilogOut program, which reorders some of the indices and 
connects busses in reverse order. This problem seems to be 
unique to the VerilogOut file generation. The physical layout 
itself gets connected correctly regardless of the index 
numbering convention. Resolving this problem required 
renumbering the indices of all modules used for Epoch input so 
that the most significant bit had the highest index, such as 
Papo: 0). 





VI: CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 


In conclusion, the objective of this research has been 
met. This thesis presents a complete hardware design for the 
PRC. The simulation results show that the PRC can deliver 
data to the CPU at the rate of 8-1-1-1, that is eight cycles 
for the first beat and one cycle for each of the remaining 
three beats. This performance is better than the performance 
of main memory (8-3-3-3). With a little more work on the 
design, the PRC should be able to deliver data at a rate of 4- 
iki 

Aguilar proposed a design consisting of six modules which 
together would comprise the PRC. He took a bottom-up 
approach, designing four of those six modules, testing each 
independently, but not together. (Aguilar, 1995) As a result, 
the designs of these modules require modifications to enable 
them to function correctly together. Rather than redesigning 
Pe -eeour modules, the approach taken during this research was 
top-down. That is, a single working behavioral model’ was 
divided into six behavioral models that functioned together, 
and then each of the six behavioral models was converted into 
a hardware model. The result is still a six-module design, 
but the six modules of this design have different functions 
than the six modules of the design by Aguilar. The top-down 
approach worked exceedingly weli to clarify the design and to 


minimize inter-module signal problems. 
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This research required a total of three academic 
quarters. The work during the first quarter primarily 
involved studying the problem, analyzing the design 
requirements, and learning about the PowerPC system. Two more 
quarters were required for the creation of the design, one 
quarter each for the behavioral design phase and the 
structural design phase. 

Epoch and Verilog-XL proved reliable and highly useful 
during the development of this hardware design. Verilog-XL 
performed the simulations necessary to verify the design. 
Epoch performed the VLSI circuit layout and timing analysis 
that were required by Verilog-XL in order to produce 
Simulation results that could be considered accurate. 

Simulations with Verilog-XL are conveniently short while 
testing small modules. However, simulations of the entire PRC 


design typically ran for half an hour on a SUN SPARC-10 work 


station. Similarly, on small designs Epoch runs fast enowae 
that a user could wait at the work station. To compile 
complex modules Epoch requires much more time. For example; 


Epoch takes over an hour to compile the Bus Interface of the 
PRC and more than three hours to compile the entire PRC. 

Both Verilog-XL and Epoch have functions and opt tome 
which are not readily apparent. That problem is compounded by 
inadequate indexes in the user’s manuals for each of these 
tools. On the other hand, the tutorials are very helpful for 
revealing some of those functions and options. 

Some of the options in Epoch require significant study 
before use. The pull-down menus in Epoch could be better 
organized. Both of these characteristics work to make Epoch 


less user-friendly than it should be. 
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B., RECOMMENDATIONS 


As with any complex design, there is much more that a 
designer could do to improve this PRC. This section describes 
some areas of potential future research related to this 
hardware design. 

The first recommendation is to consider including the 
Arbiter on the PRC chip. This PRC design was developed for a 
PowerPC-603 microprocessor system, in which both the PRC and 
the CPU are candidates for bus mastership. This requires that 
Bere be a bus arbitration unit to prevent both devices from 
trying to use the bus simultaneously. The bus arbitration 
unit 1S a simple device whose function can be fulfilled with 
a single finite state machine (FSM). It would be very easy to 
add this FSM to the PRC chip, eliminating the requirement to 
fabricate a separate integrated circuit chip. 

The second recommendation 1s in regards to improving the 
Line Manager design. The Line Manager is the block that 
requires the wait states in the Controller State Diagram. The 
impact of these wait states is a delay of three cycles in 
determining if there is a hit within the PRC. Finding a way 
of eliminating these wait states could improve the speed at 
which the PRC delivers the first beat of data to the CPU and 
the speed at which the PRC prefetches data from main memory. 
Specifically, the performance would improve from 8-1-1-1 to 5- 
1-1-1. There is a strong chance that Epoch would prove useful 
in this endeavor. Epoch has timing analysis routines and can 
perform layouts in such away as to minimize propagation 
delays for critical signals. Epoch also has automatic buffer 
Sizing algorithms which could be used to ensure the output 


Signais of each part are buffered sufficiently to drive their 


loads. These capabilities of Epoch do require considerable 
CPU time. For example, running an automatic compilation on 
the current design of the Bus Interface Unit takes over an 
hour of actual CPU time on a Sun SPARC 10 workstation if the 
buffer sizing option is selected. 

The next recommendation 1s to study the rest of the 
design for critical paths. With Epoch as an analysis tool, it 
should be uncomplicated to analyze the entire PRC for critical 
timing paths. Some timing limitations may be improved through 
the buffer-sizing and timing-critical layout capabilities of 
Ee@cm. Other timing limitations may require modifying the 
design. The current PRC design includes only parts that were 
available in the Epoch library. It may be possible to design 
parts that outperform the Epoch parts. 

The final recommendation regards fabrication?” ff) chews 
design detailed in this thesis is to be fabricated, it must 
undergo two steps. First, the power rails should be studied 
using Epoch to determine if there is a requirement for 
additional power and ground rails. Second, the design must be 
put inside a pad ring. Epoch may be able to create the pad 


ring automatically with minimal intervention by the designer. 
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APPENDIX A. LAYOUTS 


This appendix contains the VLSI (Very-Large-Scale- 
Integrated) circuit layouts for the PRC. These layouts were 


all generated by Epoch. 


Dy! 





Ee ae a tl et ate A a ae a ome 


\: 9 a Seep tigen ha a Rae Nay aye Be tye pane tea: a = Rae ame Same Maem NE GE Sth Ge Ne Kae eS A a a a Sp Na * ah TA * A Se Ae eS Se eh ORE P= ar * A 


a are a ee en er en a ny ge ee me ree eg ee re er ee ee ee ee ee ee ee 


Baty’ Stahl ahh 


LM1 <line_mgr? 


igi 


t i2hy hea: a, t 
tote Me UHURU ELE HRN 


iMgiey jee bla 


ode adeeada ideal 






DL1 <datalist) 


avae _ 


A RSET LT Ge ST et toe 





7 wd 00 Met el owe oer mew aor et ae? ae ee ew or — ff 


ep clline ott a eg a See 
coe eae Oe Ses 
FN FPR OPE ETS Oe ORFS FOR LON re ST trae 
Dal aan al | 
Sh ettet ete holt by 





Pewter 





Dobe hater etibet ili h dah catetesteteiare , (ir 
coat 


The PRC expanded to the first level. 


—— — a rene 





‘aun Ser ama B. a ae a 
eet 


Pat uce mA len 


blocks in the lower left corner, 


decreasing s$12ze, 
Predictor, 
Gueput | 


inorder of 


are the Bus Interface, 


and Controller. [Epoch 


Snooper, 


58 





{ 
wryets! 
mm 
Tt o® 
Lome 
DD 

4... 


(= 


= 


© 


a 
rt 
rR | 


=i 


TRORRAEA Pate 


a 


9 PON Oe EE ge re pee Boe Pet Ere Pee Gre ery 





=! 


Saat = 3 
a 


4, Rea. Pais 


—— 


yoere sre 


ye 


, 
Aint: gabe, ; 4 i { 
G2). dec Fett hg does ahah d wee 3 5 Hite 1}: ‘ 


hegre GP oy gewals a henep aid, 


‘ q 


AS BS A558 


tk ah ks 


ie . ‘Te Wiad / rz 
SAM ec aet 
SF ee OT ye et Ye Ee EE ET ye ye Oe) Ue He oe 
Taituitaityrtsetrete: 
duke be ea Bed 2 


sacks a 


ay 


GRA 


SY edwad 


' aeiinteEsiMetieaiiien ° 
Lok Lol od ba Be 9 hal Le ed Led ed Sng 1 


- 


CesT tear at 


Ltr 


- 


a a a ea 


a 
| 
] 
| 
} 


eT AM ara ae 


av 


+ A te ws ‘Sea = 


ae ae a a 
ia 0 eae 


et 


1am og 


Saya 2 hh Sree pt a aie ey 


oe 








IN eYoreiamerenunr Obes 


| 


Tae RCI Lik expanded: 


Damen pnatse Wr 


aS, 





ee ee ee ee ee ee 


ott 


; a ' : ‘3 ' Ae 
ae 1 “i 
Pree ty Oe ms a t Bs — al ~ rene mere 


| : 
a = eee A lle 
—* 


VERS Ub ens eee et 
j Rrepeerey envyere er 
fat 


vue 
ta 


TSE ee 


Bhs 


BS. 
tf 


> vereryy: 





Mime oe wine Comefollexr. [spoch output} 


60 





— See 


a = . . = 7 = - 
ie - eo oe | ) ri : ! . —_ = 
oe ay b fh | on “i ib ! 
is , — 
| - 


——SSEee ee 
SS 


bi ra 7 fi 
te oue if 


i 
mu AAT AL AAs 
ib 
we 


kil are : f ah = - a4 _ 


x. ‘ 
oe a 


oe 
Sectee © 
Bee 


- a ar : oF ¢ i ee an mera a pal ‘ih 4 - - : 
ab | { tea “e Pr: i a eur 


. 
ee ee 


iA Ne AA AAAS Sree re wrayer 


PL Se Hinge: he teh iL wed 
“hn eH als geal feuded a) + go alls oud Serie Ah deb 
sh ae Sas ae he ne re ot 3 | —s naib as BE. « z ~4 al r ie ees ad Fat Pal 3 : 


3 ASST WER t Carne . 
‘ff s Si: rs ; 
oy & i 


U4 1 ney re “a! ' ELS 4s $104 Aida 


on =u - ‘d ; 
eee ” = a 
“ 3 ; 7 1 eas - : ct ; om. : Re 7 rt a: 33 3a: am. | re = a - tat a ‘ - 
£:::3 a tt | i] it pa s- oH ed a of Py 
Fe H ; : F = ge . ‘ - 
a > ie ji Ee : a) sie oh rate . fi r i ig 
«SEE ney Eee 3) Srey it, Pa ral elie 
Tha, a oan A , * . AP APs z - s =U): m 
"Th > be a ne. a ade * ," Aen Aw , marred a emt be a oe eS Se re _s + ae Fe Oe ee 
S, , - “ ’ = * ‘ E a 
= co: e ; — Ba, - 4 7 t 7 i" sa th . 
OY) atk BES NG}o Sa. ae os ance fonmecctwecsceeseetire hl Shas. cove see. > 1d TRU OSI Lee = see = =: : ‘: Resse =. 
ee i e td Fo A a A o ov § oe o: ess ot So daaee aeeres ace : 
| o[™ ‘ % PP eh aie : cas 
re osne 4: ad 7 F ) 
pee ae . Cad Ae Ha ; ‘ 


ck 





Ldererele “eibiejertie | 
Sil 


The Snooper. 


Bigure Ad) 











= wm ©. 


a i 4 é 
: ¥ 3 a \ + 
t . eat a a ex] we See arn em gn anus So ye ere J ee 


( 


ae ta deh tae Nae Li 9 th mh 





Oe ae awe an. 


j 
4 
5 
§ 
| 
i 
§ 
4 
i 
: 
; 
} 
3 
} 
H 
L 
§ 
’ 
a 
i 
i 
j 
i 
j 
j 
j 
4 
4 
4 
j 
j 
j 
5 
i 
4 
2 
} 
4 
§ 
3 
| 3 
j 
j 
i 
i 
j 
4 
i 
j 
j 
j 
| 
j 
j 
4 
| 
| 
j 
4 
j 


PredMA_listl ‘(predicted_ma_list) 





ehommene 


LENT ERE STU R OU sah eRe naat shltn aae Shap OS ay = “fas 
PAE PMN in I AP - ne y : ‘ Sx % 
ak sk 7 : a5 mooi PRRORERE 


Se nea nam net Ream et etm Ret Bete Bete Bt Sw ee ee Bete Be + 


ah Cie Dh Th the tht hetheti rfieti-ni: 


ate ne we we oe we 


Figure A5. The Line Manager expanded one level. The bottom 
portion is shown in more detail in the next 
Mmiguise ml Booch Output} 


62 





hn 4 
SPUD Bes ans om 


z 


' PSE 
ey ei Sy St de pseaeebesaes Sonam sar dees na sr kege [tenes 
etn Nas te 


~~» 


Lhe ers ars woe ae ahr war 


aco 


sercrucsonersfaeterses aed 
mer eee 


ee oe 


iia 


THAT Suis 
Ty 


i 


i 
( 
' 


—— ty ent 


: 
| 


a 8 gmt ng ae 


=~ 


bleed tle ted eel ee ene tod 
a ne a ee ree 
ee ee ee ee 


ee Oe te rn we ne we we s0s eg 
ee ee eae ik 


-~ 
_ 
Tad 
c 
| 
-_ 
5 
S 
> 
% 
_ 
Qa 
® 
x 
© 
(= 
— 
— 
~~ 
mom 
z 
= 


ht ed | ns a kay 
t 


itt RURIRS; HuiT a 


Lars 
5 On En ee et ee ee eee § 
i 


list <hsram#o2) | 


iMRMA__ 


’ 
y ee a nS ee | a me a hema 


i 
tf, 
of 
he 


nee tne a ee a 





Betarlwot the bottom portion 


The Line Manager. 


Figure A6é. 


[Epoch output] 


figure. 


the previous 


IL IG 


63 








or) 
® 


*# 


© 


OF Pe te Oe 


mp wen 


*,* 


Bi nk 
A Radom vatliaind’ Crew: 


4 
ik 


- 


at ep g (mee ewe 
re 


of ok 


as ape ee =o 
ps ghe ts 


Rares LS Sater ont 


mo 


a PNTET | 
pantie 3s 


Sy he 
o wa et 


f 


Sho i oo 


Uy ome 


Ph be 1] bd 
ee ROT wT oe L > Va 
- , 


a va 


1 i 


press CORES Swe Oy, Eeotar dr 


a 


swe: MAR Syke wie ws CAS ee 
‘ tee Seats 
See 


ae 


SOE IRV ary oes tee haere are 


2 
re 


“Coy 





os NAEP VAD ADAMS TES ES fd AAA UL EUS hE NS NY 
Sa pa i A 
, * 


. oT - 


3 Deas 


8 == 
eS - 


FE h@ide 














2 RT ES ee ete Ce et AO ee er Ue Be Oe mts tet OY tt ot tet 
| 


[djeielelam eibleje) eng) 


The Line Manager fully expanded. 


Figure A’. 


64 









= _ ee 

b4 Niki er < Ll Saks 

=e Erne ok ice res 
ake pa ee 


M<2XT 


taste itech wet 


Pema a “15 


if 


rs 
re 





i ee a) ee) a 
3 r 
; : ACATA CH 
— timehet == i WA a LW ethaxtes 
il 





z . ~ 
eae. vt 
* 
ee 
- ae vB 

, 1 
oe 6 ur 8 : 
* ” \ i 


7 


_ 
i 









B 





ais 


_ ege Pate ohe eG ca eh iy Ob? & BR OB oh Cb eit obs OR OR Obed ide G7 ob 7 BE ob o§ 
7 " e r ° r = - 4 € wr Sw S fish 
“a3 - = *D ke wig re ~3t ms 9 7 


= , i meiat 


Figure A8. The Predictor fully expanded. [Epoch output] 


65 





iets 
sett 
eres cE ct 4 
aan room athisttrtis oe 
, 4 


faqEecns- 


i 


= 


‘ 


i 


os SAE Aa oonc acne rel: Stet 


n° 
ee 
i AR og 
as! G * “ 
v = 


Chas 


atl 


6 “OOD ORELS TY 





Figure AG. The Data List fully expanded. [Epoch output] 


66 





i 


9 «Rohe. SP A ete 


3 
syre 


z 


te 


2 ed ; 
5D WEY sys SPS 


$ 
a 


ee Ee Soe 


a4 § 
ced my pepe ee aampge? Be 


a sree pee 
ia a oa ‘By ie uy posit FeaeRe eet Laois Sie 
Tat ples ents + o¥i yaa pene) war 


= B Kars’ 
th eg 4 we PS 


pax 5 


utes a367. 25% nice Fi ate sie a rW Peston yh nin FASS a $7) Sse sare ca ree ‘ge a 2 tT lt 
2 


mom 


= us toe OANP De eng tice hey Sete cae é uais 


S Se + —_ ) 
i iene omen ae ee —e arr 


to% 4 & 


a , ‘ 4 
be eee Lane hipaa 3h) come 


pe 
} =,’ ruyg <P a 


= 
Li 


ieee 


Me 


7 oop ik aye: 


iy tee an fea enim 


ut =3 Se eae ae ae Ye “- ba Dee 


sai 


E 


shad 


a priate 3 
Misa 
al "i 


_— 
haa 


ie 
‘a 


wee Bee 


trae 


i 


4” 
a Ma UX a Sy pee 


Gass 


fied ited Peay 


Peis: Ri LEP SH saat ts hia ae 


K 


“G. Baheen, “Tali gh. 


ate B ¥; 


~ fe 
vena “Bs ac sbaca. e 243 eigen’ iit Ferm Pa) 
é 


samrimsretetiac: x te de nad wie t + % 


¥ a 
AL. mage a ae eee A Oe) 
: f 


ae Freee ce 


Foe. =? 


a 


ae ~~ 


ete z. 


Ay 


ris 


*hoeall «3 


- SRR » Lat ci Chapin dans Be aia a a, ben 


ye 


BR 4 oe 


iia # 30 “seit ee 


ai. 


7 
, A : 4 


a ae 
“ 
~~ he - = 


iat a Lee e234 


@¢ 
A 


ce 


er 


f SAPS MATA EMAL 
PETITES 


Pee A ag a 


x es 





Booch OUuEDUTE | 


fully expanded. [ 


f 


a 
— 


Interfac 


The Bus 


) 


Picoine 2 10 


OF 





ae peo ‘ 


i ee 


pa Beal) | 


ptt taws 


TiS he ia Boe 


TRAVEL 


& 
. 


SUT RIP 


ar oR XEE ay =e 


_ “th bd 


re 


vated J ti: 


a | 


e—s° 


We 
= 


Seer = 
"paral 


oa 


€ 


a oe ae a A 


hate eke 
a 


] he on Be) 


ae 


“Uhh 
a 


tary 


es ee | be oe 


The Line Replacement 


SubpuUtr 


isha 


Se SA A SM a’ : 


iy 
9 eo ~ 
eC ae aA elie as at Sarl 2 a 
@n @ 
* 3 
+ x 
Os 


Bt 5 ved 
lots - 
‘: 4 
L ¢ 
hse 


m>ee8 be 


bine 
ig Pp 0. 


ap 


Tir 


ge it, 
Va 


a f of 
A» ye 9 _ 


> a 
f LO 


Tne ttt 8 roe 


Ce a  Depapery 
net oF * > - ge A> wis 


ar Bat: “10 we a02 Tie 


: 


mibegegsiene 


4 5 tiheesdiiinndiiedhih edie otth-oeie 
aati 
a 


_ _- 


gies . 5 Coron oe ,¢, 


Ser ASAE oma aa eee econ 
apie oun 


Ht 


33 Gikv ss Zi ae 
View. ae 


EAS ee Se he NE 


t= iY 


ae Seed” 3. DeMEE: 
vemraree 


= = 2." interme wl ad 4 b's i 
as } a] fit Cie Ee ied 


ye. 


4 tiv 





Unit Fully expanded. 


68 


a * ¢ - “ee 
eelaeeatieciet SG Sa ee 5 eo | 
ba meerrerery aa Sa tnd | 


= Woe oe “ih Oe | Gear 
TL eh al 


[Epoch 





< % ke 3 mm ae 
G mw in tee 8 


et 
wh OFS OFS OFF O08 OE ~ Te 


eh ty 


gg TARARATE: CIA Tite, AE 


a] 1 ‘ 
eR a a SO 
A PRO a a HO | 


"a 
> ow ™s 
by = a : 
“a exe ee ; 


eee. 


. a ‘ > 
a] 4 ; 
: fod. top SBS ss 


me - 
‘ = 
an 
WE .% 


Oe aed 


nea: 


as 

¥ a | $ & > 

, * mse. ew “ : 
oe RertD s @ 
= P 


pe EK 
“et. aS © ee 
oY Dette |S thes 


f 
$ St TLL SAS beente 
» § 
as 4 t 
7s 
¥ 
t 


t . ont ps *s U ‘ 

nawas Y ee Cer oe See ee 2 ee 
oes ee p y sabest Raloa a bee 
bows od at . 


AER OR 


, 
, 


¥ 


' 2 re don & 
ee gvie 
Beged 
% 


3 


| woe 
ae 


8. 


== AS 


_—s 
*¢ % 
. 
‘ 


' es 
e 


¥ yh] 
“¥ ae = pas an : any 
Pee ire tah 
. : . > 
{ fra ¢ 235% 


' _ 


ewlee «4 « 





The Predicted Memory Address List fully 


Figure Al2. 
Se qoamaecen|| TOOCmmoOutL DUE | 


69 





| 
{  — fee ete 
aul “Ta 
aa . 


3-<, 
.--" ee 
‘, 


AM i VF Til i 2 


5 


Se emieeere 
sesSasesetebone i 
eatepgan 


+8eeen8a 


rir Tf 


ae) "eet esacseses Eosose gouseseaet> as testi: sosearsacem soewe ITE ae 


4 ict Boor: —— ek! MOA AT ASS 
4 ‘ 7 q 
Seep : , 


Brae ek 


witty 5 e 3a “ lp 3 


in i Seta | 


oe a | 


i er, wit aE 


it wir | ARgDS 
ar SE RP A RE 


@ is 


\ Bilin Peli’ 
we 2 ol ~ bei ct br 


jae waa 





ae = 


Figure ‘A1l3. The 128-to-7 Prior? ty Encoder fully expanded. 
PESOCm. Foc put | 


mes _ - =. ———— . — _ ar ad 








70 








eS aes REE EEES SAAAEB EEL EEE Saasaees tS ez te ss 2o3 


ey) 
ea 
aS 
23 


5 SSE rs 


Ss Sahai a 


32 


, ; O 
RS “ares 
CARN RAS PROS Ls 


EAD NEA 


rsh te | 


a Li 
J 
Ee av 09 ON 
ATATA RTS 


Be: 
i 


DiLigie 
At Sb 8. 


ve’ Dana) 


Mies Preqacled 


Ligure Al4. 


“-xpanded. 


fete 


s Register 


Addres 


f 


MOE pue | 


— + 


ah oles 


[ 


Fes 





APPENDIX B. TESTBENCH VERILOG FILES 


This appendix contains the Verilog files for the 
Testbench. They are all behavioral models, used together to 
test the PRC design. The file are located on the Computer 


Center system at joshua_u2/jrrobert/thesis/verilog/behavior. 


A. TESTBENCH 


ene REE REALE RRR ERK RAR EERE KA RRA A HRA RRR RAK RK RRR R EEE 


* TESTBENCH 

* Filename: testbench.v 

* Author: Joseph R. Robert, Jr. 

*Date: 24AUG95 

* Revised: LOJAN96 

* 

* Purpose: This module is the highest level in the design hierarchy. It 
* emulates a complete computer system, composed of 
1. cpu: a PowerPC-603 microprocessor. 

2. ram: random access memory. 

3. arbiter: the bus arbitration unit. 

4. prc: the predictive read cache under design. 


System configuration and features: 
Single CPU 
64-bit data bus 
No out-of-order split-bus transactions. 
Synchronous interface: all 1/O sampled on nsing edge of bus clock. 
66 MHZ system clock, 66 MHZ CPU clock. 


Simulation should be done with a time unit = I ns. 
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module testbench; 


// Signal Declarations - conforms to PowerPC-603 notation 


// Address Arbitration 

wire CPU_BR_, //Bus Request 
Choe pGs //Bus Grant 

tril ABB_; //Address Bus Busy 


a 


ties. /{Transfer Start (memory only, not I/O) 


// Address bus 

wire [0:31] A; //Address (note Motorola's reverse notation) 
wire [0:3] AP; //Address Parity 

wire APE_; //Address Parity Error 


// Transfer attributes 
wire [0:4] TT; /fTransfer Type 
wire [0:2] TSIZ; /fTransfer Size 
wire [0:1] TC: /{Transfer Code 
tril TBST_; //Transfer burst 
wire GBL_, 

Gite 

WT_, 

CSE: 


// Address Termination 
til AACK_;  //Address Acknowledge 


reg ARTRY_; //Address Retry 
// Data Arbitration 
wire CPU_DBG _; //Data Bus Grant 


reg DBWO_; //Data Bus Wnite Only 
tril DBB_; //Data Bus Busy 


// Data Transfer 

wire [0:63] D; //Data 

wire [0:7] DP; //Data Parity 

wire DPE_, //Data Parity Error 


DBDIS_; //Data Bus Disable 


// Data Termination 

fil AS: /{Transfer Acknowledge 

reg DRTRY_; //Data Retry 

reg TEA_; /fTransfer Error Acknowledge 


// System control 

reg HRESET_; //Hard Reset 

wire PRC_BR_:; //PRC Bus Request 
wire CANX; 


//Declare variables, constants, parameters 
parameter TRUE = 1'bl, 


FALSE = 1 bu: 
ove able: 
low = 1'b0; 


//Initialize values. 
initial 


begin 


DBWO_ =h; //Limits CPU to in-order transacuons. 
TEA_ =hi; //Only asserted for nonrecoverable bus error events. 
ARTRY_=hi; /fRetries used only with multiprocessor or multi- 
(OR Ry — ii, // level memory systems. 
HRESET_= hi; 

end 


//define system clock, 66 MHz, T = 15 ns. 
reg clk; 
initial clk = 1; 
always 
begin 
me clk = 0; 
#8 clk = 1; 
end 


//Connect parts 

cpu CPU1(CPU_BR_,CPU_BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_, 
CI_,WT_,CSE,AACK_,ARTRY_,CPU_DBG_,DBWO_,DBB_,D.DP, 
DPE_,DBDIS_,TA_,DRTRY_,TEA_.,clk); 

memory MEM1(ABB_,TS_,A,AP,APE_,TT,TSIZ,TC.TBST_,GBL_,CI_,WT_,CSE,AACK_., 
DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,TEA_,CANX,clk); 

arbiter ARB1(CPU_BR_,CPU_BG_,CPU_DBG_,PRC_BR_,PRC_BG_,PRC_DBG_, 
ABB_,DBB_,clk); 

prc PRC1(CPU_BR_,PRC_BR_,PRC_BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC, 
TBST_,AACK_,PRC_DBG_,DBB_,D,DP,DPE_,TA_,HRESET_,CANX, clk); 


//fran simulation 
initial 
begin 
//Sshm_open; 
#5 HRESET_=low; //Reset entire system. 
#5 HRESET_ = hi; 
//#4000; 
//$shm_probe(PRC1,"AS"); 
#152000 Sfinish; //Adjust this time according to the instructions 
//in the sequencers. 
end 


endmodule 


aa 


Be Cry 
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* PowerPC-603 CPU 

+ Filename: cpu.v 

* Author: Joseph R. Robert, Jr. 

* Date: 24AUG95 

* Revised: 1OJAN96 

ok 

* Purpose: This module emulates the PowerPC-603 microprocessor. Note that 
* most signals are active low. This makes it slightly more difficult to work 

* one's way through all the double negatives in this code’s conditional 

* statements, but makes it much easier to correlate against the timing diagrams 
* in the PowerPC-603 User Manual. This model uses the same notations for 

* sionals that connect to other modules. 

* This module uses the sequencer module to determine the operations the CPU 
* will perform. This model of the PowerPC-603 is capable of performing reads, 
* writes, burst reads, and burst writes. It handles bus arbitration just like 

* the '603 including the pipelined address tenures. Please refer to the 

* PowerPC-603 User Manual for a detailed description of the nature and timing 
* of each signal. 


* 
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module cpu (BR_,BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_,WT_,CSE,AACK_, 
ARTRY_,DBG_,DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,DRTRY_,TEA_.,clk); 


// Signals are defined in system.v. 

input BG_,AACK_,DBG_,DBWO_,DBDIS_,TA_,ARTRY_,DRTR Y_,TEA_.,clk; 
output BR_,APE_,CI_,WT_,CSE,DPE_; 
inout [0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

inout [0:4] TT; 

inout [0:3] AP; 

inout [0:2] TSIZ; 

inout [0:1] TC: 

inout ABB_,TS_,TBST_,.GBL_,DBB_:; 


reo BReAEE.C) WI SGSE DEES 
tri [0:31] A; 

tr. [0:63] D; 

tr (0:7) BE. 

tri [0:4] TT; 

tr1 [0:3] AP; 

in [0:2] TSIZ: 

ie (OTe: 

tn ABB_,TS_,TBST_,GBL_,DBB_; 
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//declare variables, constants, parameters 
parameter TRUE = 1'bl, 
FALSE = 1'b0, 
hi =1'bl, 
low = 1'd0, 
trace = FALSE; 


//Address related 

wire [0:31] seq_addr; 

reg [0:31] addr_reg, address[0:1]; 

reg [0:31] a_reg; 
assign A = a_reg; 

reg [0:3] ap_reg, addr_parity_in, addr_parity_calc; 
assign AP = ap_reg; 


//Data related 

reg [0:63] data [0:1]; 

wire [0:63] seq_data; 

reg [0:63] d_reg, load_data, data_reg; 
assign D = d_reg; 

Feeapo 2 55) line reg, line [0:1]; 

wire [0:255] seq_line; 

reg [0:7] dp_reg, d_parity_in, d_parity_calc; 
assign DP = dp_reg; 


//Other external control signals 
reg Transfer_start [0:1]; 
reg abb_reg_, dbb_reg_, ts_reg_, tbst_reg_; 
assign ABB_ = abb_reg_; 
assign TS_ =ts_reg_; 
assign DBB_ =dbb_reg_; 
assign TBST_ = tbst_reg_; 


reg [0:4] Transfer_type [0:1]; 
wire [0:4] seq_Transfer_type; 
reg [0:4] tt_reg; 
assign TT = tt_reg; 
parameter //for Transfer_type 
none = 3) 6VA 
write = 5'b00010, //02 
write_atomic = 5'b10010, //12 
read = 5'b01010, /OA 
read_atomic = 5'b11010, //IA 
burst_write = 5'b00110, //06 
burst_read =5'b01110, //OE 
burst_read_atomic = 5'b11110; //1E 


reg [0:2] Transfer_size [0:1]; 


wire [0:2] seq_Transfer_size; 
reg [0:2] tsiz_reg; 


fey 


assign TSIZ = tsiz_reg; 


reg [0:1] Transfer_code [0:1]; 
wire [0:1] seq_Transfer_code; 
Feo (Osuiiceres: 
assign TC = tc_reg; 
parameter //for Transfer_code 
data_transfer = 2'b00, 


touch_load = 2 pule 
instruction_fetch = 2'b10, 
reserved = Zot: 


//Other internal control signals 
reg need_bus_; 
wire need_bus_tnigger_; 


reg AB_Master,DB_Master, Addr_termination: 


wire qual_BG_,qual_DBG_; 
reg [0:7] index; 

wire parked; 

wire pp; 

reg dpp; 

event transfer_acknowledged; 


//fmitialize signals 
initial 
begin 

areg  <=32bz; 
ap_reg <=4'bz; 
addr_parity_in <= 4'bz; 
addr_parity_calc <= 4'bz; 
addr reg <= 32'bz; 
address[0] <= 32'bz; 
address[1] <= 32'bz; 


data[O] <= 64'bz; 
data[1] <= 64'bz; 
d_reg <= 64'bz; 
line[O] <= 256'bz; 
limel }] <= 256 pz: 
Iheeres <= 256 bz: 
d_panty_in <= 8'bz; 
d_parity_calc <= 8'bz: 
dperes <= 8 bz; 


APE_ <='bz: 
BR_ <=hi; 
C= = hie 
(SEs — low 
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DPE_ <= ‘bz: 

Wil <= hig 

abb_reg_ <='bz; 

dbb_reg_ <= bz; 

ts_reg_ <= ‘bz; 

tbst_reg_ <= ‘bz; 
Transfer_type[0] <= none; 
Transfer_type[1] <= none; 
tt_reg <= none; 
Transfer_size[Q] <= 0; 
Transfer_size[1] <= 0; 
tsiz_reg <= ‘bz; 
Transfer_code[0] <= reserved; 
Transfer_code[1] <= reserved; 
feanes <= 2 bz; 
Transfer_start[0] <= FALSE; 
Transfer_start[1] <= FALSE; 


AB_Master <= FALSE; 
DB_Master <= FALSE; 
Addr_termination <= FALSE; 
need_bus_ <= hi; 
dpp <= 0; 

end 


/ 


sequencer SEQ 1(seq_Transfer_size,clk,pp,seq_addr,seq_data,seq_line, 
seq_ Transfer_type,seq_Transfer_code,need_bus_trigger_,ABB_ ); 


always @(negedge need_bus_trigger_) 

begin 
address{pp] <= seq_addr; 
data[pp] <= seq_data; 
line[pp] <= seq_line; 
Transfer_type[pp] <= seq_Transfer_type; 
Transfer_size[pp] <= seq_Transfer_size; 
Transfer_code[pp] <= seq_Transfer_code; 

end 


// 
//ADDRESS BUS TENURE 
/{ *** 1, Address bus arbitration 


always @(negedge need_bus_trigger_) 
need_bus_ = low; 


//Parked means that the CPU can take the bus as soon as it needs it. 


assign parked = (!BG_ & ABB_ & ARTRY_); 


//lf CPU needs bus, it needs to assert BR_ only if not parked. 
always @(posedge clk) 
if (BR_ == hi) 
BR_=#7 ~(need_bus_==low & parked==FALSE); 


assign qual_BG_ = ~(need_bus_==low & parked==TRUE); 


//Assume mastership 
always @(posedge clk) 
if (qual_BG_ = low) 
begin 
abb_reg_ = #7 low; 
AB_Master = TRUE; 
Bie — ale 
need_bus_ <= #2 hi; 
end 


if *** 2. Address Transfer 


always @(posedge clk) 
if (qual_ BG_ = low) 
begin 


addr_reg = address[pp]; 
addr_parity_calc{0] <= ~Aaddr_reg[0:7]; 
addr_parity_calc[1] <= ~Aaddr_reg[8:15]; 
addr_parity_calc[2] <= ~Aaddr_reg[16:23]; 
addr_parity_calc[3] <= ~*addr_reg[24:31]; 
ts_reg_ = #7 low; 
Transfer_start[pp] <= TRUE; 
a_reg <= address([pp]; 
ap_reg <= addr_panty_calc; 
tt_reg <= Transfer_type[pp]; 
tsiz_reg <= Transfer_size[pp]; 
tc_reg <= Transfer_code[pp]; 
if (Transfer_type[pp] == burst_read 
ll Transfer_type[pp] == burst_write) 
tbst_reg_ <= low; 
//insert other address transfer characteristics here. 
end 


always @(posedge clk) 
if (AB_ Master & TS_==low) 
begin 
ts_reg_ = #7 hi; 
wait (AACK_==low); 
Addr_termination = TRUE; 
end 
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always @(posedge cik) 
if (Addr_termination) 
begin 
#7 ts_reg_ <= 'bz; 
areg <='bz; 
ap reg <= ‘bz; 
ttreg <='bz; 
tc_reg <='bz; 
tsiz_reg <= ‘bz; 
tbst_reg_ <= ‘bz; 
/Ansert other addr transfer characteristics here. 
abb_reg_ <= #2 hi; 
abb_reg_ <= #8 ‘bz; 
AB_Master = FALSE; 
Addr_termination = FALSE; 
end 
H 
/DATA BUS TENURE 


assign qual_DBG_ = ~(!DBG_ & DBB_ & DRTRY_): 


always @(posedge clk) 
begin 
if (TA_ == low) 
-> transfer_acknowledged; 
end 
always 
begin 
#2 dpp = ~dpp; 


case(Transfer_type[dpp]) 
none: begin end 


//Note: TS is an implied data bus request. CPU can assume mastership if it 
/fnas a qualified data bus grant. 


read: begin 
//waat for qualified data bus grant and transfer start. 
wait(qual_DBG_==low & Transfer_start[dpp]); 
@(posedge clk) //assume data bus mastership 
dbb_reg_ <= #7 low; 
@(transfer_acknowledged) //latch data and terminate read 
data{dpp] <= D; 
data_reg <= D; 
doparitysin <— DP: 
Transfer_type[dpp] <= none; 
Transfer_code[dpp] <= reserved; 
Transfer_start[dpp] = FALSE; 
d_parity_calc[O] <= ~“data_reg[0:7]; 
d_parity_calc[1] <= ~“data_reg[8:15]; 


il 


d_parity_calc[2] <= ~4data_reg[16:23]; 
d_parity_calc[3] <= ~4data_reg[24:31]; 
d_parity_calc[4] <= ~data_reg[32:39]; 
d_parity_calc[5] <= ~Adata_reg[40:47]; 
d_parity_calc[6] <= ~Adata_reg [48:55]; 
d_parity_calc[7] <= ~“data_reg[56:61]; 
if (trace) begin 
$display("CPU read %h from address %h.", 
data[dpp],address[dpp]); 
$display(" Completed at time %d",$time); 
end 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 ‘bz; 
if (d_parity_in != d_parity_calc) 
begin 
$displayC'CPU: data parity error."); 
$display(" Calculated parity: %b", 
d_parity_calc); 
$display("" Recevied parity: %b", 
d_parity_in); 
end 
end 


write: begin 
data_reg = data[dpp]; 
d_parity_calc[0] <= ~“data_reg [0:7]; 
d_parity_calc[1] <= ~4data_reg[8:15]; 
d_parity_calc[2] <= ~4data_reg[16:23]); 
d_parity_calc[3] <= ~“data_reg [24:31]; 
d_parity_calc[4] <= ~Adata_reg [32:39]; 
d_parity_calc[5] <= ~“data_reg[40:47]; 
d_parity_calc[6] <= ~data_reg[48:55]; 
d_parity_calc[7] <= ~“data_reg [56:61]; 
//wait for qualified data bus grant and transfer start. 
wait(qual_ DBG_==low & Transfer_start[dpp]); 
@(posedge clk) //assume data bus mastership 
dbb_reg_ = #7 low; 
d_reg <= data[dpp]; 
dp_reg <= d_panity_calc; 
@(transfer_acknowledged) //terminate write 
d_reg <= #7 64'bz; 
dp_reg <= #7 8'bz; 
Transfer_type[dpp] <= none; 
Transfer_start[dpp] = FALSE; 
if (trace) begin 
$display("CPU wrote %h to address %h.", 
data[dpp],address[dpp]); 
$display(" Completed at time %d", $time); 
end 
dbb_reg_ = #4 hi; 
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dbb_reg_ = #8 ‘bz; 
end 


burst_read: begin 
//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_==low & Transfer_start[dpp]): 
@(posedge clk) //assume data bus mastership 
dbb_reg_ <= #7 low; 


if (trace) 
$display(""CPU started read from address %h at time %d.", 
address[dpp],Stume); 
repeat (4) begin 
@(transfer_acknowledged) //latch beat 
dataldpp] <= D; 
data_reg <=D; 
d_parity_in = DP; 
#1 if (trace) 
$display(" CPU read: %h at %d" data[dpp], $time); 
d_panty_calc[0] <= ~“data_reg[0:7]; 
d_parity_calc[1] <= ~4data_reg[8:15]; 
d_parity_calc[2] <= ~“data_reg[16:23]; 
d_parity_calc[3] <= ~“data_reg[24:31]; 
d_parity_calc[4] <= ~4data_reg[32:39]; 
d_parity_calc[5] <= ~4data_reg[40:47]; 
d_parity_calc[6] <= ~Adata_reg[48:55]; 
d_parity_calc[7] = ~data_reg[56:61]; 
#2 1f (d_parity_in != d_parity_calc) 
begin 
$display(""CPU: data parity error.”); 
Sdisplay(" Calculated parity: %b", 
d_parity_calc); 
$display( Recevied parity: %b", 
d_parity_in); 
end 
end 


Transfer_type[dpp] <= none; 
Transfer_code[dpp] <= reserved; 
Transfer_start[dpp] <= FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 

end 


burst_write: begin 
//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_==low & Transfer_start[dpp]); 
if (trace) 
Sdisplay("CPU started write to address %h at time %d.”, 
address[dpp],$time); 
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@(posedge clk) //assume data bus mastership 
dbb_reg_ = #6 low; 
line_reg = line(dpp]; 
data_reg = line_reg[0:63]; 
d_panty_calc[O] <= ~“data_reg[0:7]; 
d_parity_calc[1] <= ~Adata_reg[8:15]; 
d_parity_calc(2] <= ~Adata_reg[16:23]; 
d_parity_calc[3] <= ~“data_reg[24:31]); 
d_parity_calc[4] <= ~“data_reg[32:39]; 
d_parity_calc[5] <= ~Adata_reg[40:47]; 
d_parity_calc[6] <= ~Adata_reg [48:55]; 
d_parity_calc[7] = ~data_reg[56:61]; 
dp_reg <= d_parity_calc; 
doreg = line ree(G:63): 
#1 if (trace) 

Sdisplay(’ CPU write beat 1: %h at %d",d_reg,Stume); 
@(transfer_acknowledged);  //first beat done 


data_reg = line_reg[64:127); 
d_parity_calc[0] <= ~data_reg[0:7]; 
d_parity_calc[1] <= ~Adata_reg([8:15]; 
d_parity_calc(2] <= ~Adata_reg[16:23]; 
d_parity_calc[3] <= ~Adata_reg[24:31]; 
d_parity_calc[4] <= ~“data_reg([32:39]; 
d_parity_calc[5] <= ~Adata_reg [40:47]; 
d_parity_calc[6] <= ~Adata_reg [48:55]; 
d_parity_calc[7] = ~Adata_reg[56:61]; 
dp_reg <=d_panity_calc; 
#7 d_reg = line_reg[64:127]; 
#1 if (trace) 

$display(" CPU write beat 2: %h at %d",d_reg,Stime); 
@(transfer_acknowledged);  //second beat done 


data_reg = line_reg[128:191]; 
d_parity_calc[O] <= ~“data_reg[0:7]; 
d_parity_calc[1] <= ~Adata_reg[8:15]; 
d_parity_calc[2] <= ~“data_reg[16:23]; 
d_parity_calc(3] <= ~Adata_reg[24:31]; 
d_parity_calc[4] <= ~Adata_reg[32:39]; 
d_parity_calc[5] <= ~Adata_reg [40:47]; 
d_parity_calc[6] <= ~\data_reg [48:55]; 
d_parity_calc[7] = ~4data_reg[56:61 J; 
dp_reg <= d_panity_calc; 
#7 d_reg = line_reg[128:191]; 
#1 if (trace) 

$display(" CPU write beat 3: %h at %d",d_reg,Stime); 
@(transfer_acknowledged);  //third beat done 


data_reg = line_reg[191:255); 
d_parity_calc[O] <= ~Adata_reg[0:7]; 
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d_parity_calc[1] <= ~Adata_reg[8:15]; 
d_parity_calc[2] <= ~Adata_reg[16:23]; 
d_parity_calc[3] <= ~Adata_reg[24:31]; 
d_parity_calc[4] <= ~Adata_reg[32:39]; 
d_parity_calc[5] <= ~Adata_reg[40:47]; 
d_parity_calc[6] <= ~*data_reg[48:55]; 
d_parity_calc[7] = ~“data_reg[56:61]; 
dp_reg <= d_parity_calc; 
#7 d_reg = line_reg[192:255]; 
#1 if (trace) 

$display("" CPU write beat 4: %h at %d",d_reg,Stime); 

@(transfer_acknowledged);  //fourth beat done 
d_reg <= #7 64'bz; 
dp_reg <= #7 8'bz; 
line_reg <= #7 256'bz; 
Transfer_type[dpp] <= #7 none; 
Transfer_code[dpp] <= #7 reserved; 
Transfer_start[dpp] <= #7 FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 ‘bz; 

end 


default: $display("CPU module has bad TT[%b] = %b",dpp, 
Transfer_type[dpp],” at time %d.",Stime); 
endcase 
end 


endmodule 


Cy ARBITER 
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* BUS ARBITRATION UNIT 

* Filename: arbiter.v 

* Author: Joseph R. Robert, Jr. 

*Date: 24AUG95 

* Revised: LOJAN96 

* 

* Purpose: This module emulates the system's external bus arbitration unit. 

* It is implemented as a Finite State Machine. 

* There are only two possible bus masters in this system: the CPU and the PRC. 
* Also, the address bus and data bus are each arbitrated for independently, 

* though the data bus arbitration occurs after the corresponding address bus 
* arbitration. 

* If aunit wants the address bus, it asserts BR_. If the bus is available, 

* the aribter asserts BG_ back to that unit, which can then take mastership by 
* asserting ABB_. When it is done with the address bus, it negates ABB_. 


So 


* Jt is assumed that if a unit wanted the address bus, it will also want the 

* data bus. "Address only" transactions will not occur in this system, since 

* there is no external cache or multiprocessors. Therefore, after asserting 

* BG_ to the requesting unit, the arbiter asserts DBG_ on the next cycle. 

* BG_ and DBG_ are both asserted unul the requesting unit takes mastership, 
* unless the requesting unit withdraws its request by negating BR_. 

* If there are no pending bus requests, the arbiter "parks" the CPU by 

* oranting it the busses. This reduces memory access time for the CPU. If the 
* CPU is parked, and then the PRC requests the bus, the CPU is unparked, and 
* the arbiter can then grant the bus to the PRC. 

* The PowerPC can conduct a second address tenure long before the first data 
* tenure is complete. This pipelining has a maximum depth of two transactions, 
* meaning that a third address tenure will not start before the first data 

* tenure is complete. The Memory Unit in this Testbench is capable of handling 
* that situation. However, adding the PRC to the system creates the 

* possibility that the PRC will mitiate a third address tenure before the 

* first of two CPU transactions is complete. This situation is handle by this 

* Arbiter which keeps track of the pipelining depth. It will not grant the 

* address bus to any unit if that address tenure would put a third transaction 

* in the pipeline. Rather, the arbiter will stall until the data tenure from 

* the first transaction is complete, and then will grant the address bus to the 

* requesting unit. 

ok 
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module arbiter (CPU_BR_,CPU_BG_,CPU_DBG_,PRC_BR_,PRC_BG_,PRC_DBG_, 
ABB_,DBB..clk); 


output CPU_BG_, CPU_DBG_, PRC_BG_, PRC_DBG_; 
input CPU_BR_, PRC_BR_, ABB_, DBB_, clk; 

reg CPU BGS GEULDEGe PRGIEG@ ERGs EG: 
wire CPU_BR_, PRC_BR_., clk; 


//Declare vaniables, constants, parameters 
parameter TRUE = 1'bl, 
FALSE = 1'b0, 
hi =1'bdl, 
low eel bo: 
reg {1:0] requests; //concatenated input signals 
reg [1:0] depth; 
uri stall; 


//Finite State Machine variables and parameters 
reg [2:0] state, next_state; 
parameter start = 1, 
grant Ccotlea = 2, 
palkscole— 5 
grant_cpu_d = 4, 
grant_prc_a=5, 
wait_for_prc = 6, 
grant_prc_d =7; 
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//initialize outputs 
initial 
begin 
CEULBG. <= hi; 
CPU_DBG_ <=hi; 
PRC_BG_ <=h; 
PRC_DBG_ <=hi; 
State <= Start; 
next_state <= start; 
requests <= 'b11; 
depth <= 0; 
end 


/{Track depth of pipeline 
always @(posedge ABB_) 
begin 
depth = depth + 1; 
end 


always @(posedge DBB_) 
begin 
depth = depth - 1; 
end 


assign stall = (depth > 1); 


H 
//Arbitration 


always 
begin 


wait (!stall); 
#5 state = next_state; 


#1 case (state) 
start: //1 
begin 
CPU_BG_ <=hi; 
CPU_DBG_ <=hi; 
PRC_BG_ <=h; 
Pree DBG. <= hi: 
@(posedge clk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 
2'b00: next_state = grant_cpu_a; 
2'b01: next_state = grant_cpu_a; 
2'b10: next_state = grant_prc_a; 
2'b11: next_state = grant_cpu_a; 
endcase 


Sy 


end 


grant_cpu_a: //2 

begin 
CPU_BG_ <=low; 
CPU PD BG <=. 
PRCsBG. <— fi: 
PRC_DBG_ <= hi: 
@(posedge clk); 
next State =palk cpu: 

end 


park_cpu: //3 
begin 
CPU_BG_ <=low; 
CPU_DBG_ <= low; 
PRC BG =r 
PRC (DBG 7 <=ii:- 
@(posedge clk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 
2'b00: next_state = park_cpu; 
2'b01: next_state = park_cpu; 
2'b10: next_state = grant_cpu_d; 
2'b11: next_state = park_cpu; 
endcase 
end 


grant_cpu_d: //4 
begin 
CPUS Gai: 
CPU_DBG_ <= low; 
PRE BG} <— in; 
PRC_DBG_ <=hi; 
@(posedge clk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 
2'bO0: next_state = park_cpu; 
2'bO1: next_state = park_cpu; 
2'b10: next_state = grant_prc_a; 
2b11: next_state = park_cpu; 
endcase 
end 


grant_prc_a: //5 
begin 

CPUSEG ea —— nr 
CRUSDBG <= li: 
PRC _BG_ <=low: 
PRC_DBG_ <= hi; 
@(posedge clk); 
next_state = wait_for_pre; 
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end 


wait_for_prc: //6 
begin 
Grier Gr <= hi: 
CPU_DBG_ <= hi; 
PRC_BG_ <=low: 
PRC_DBG_ <= low; 
@(posedge clk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 
2'bO0: next_state = wait_for_prc; 
2'b01: next_state = grant_cpu_d; 
2'b10: next_state = wait_for_prc; 
2'b11: next_state = grant_prc_d; 
endcase 
end 


grant_prc_d: ///7 
begin 
CPU_BG_ <=hi; 
CPU_DBG_ <=hi; 
PRe. BG <= hi; 
PRC_DBG_ <= low; 
wait (DBB_ = hi); 
@(posedge clk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 
2'b00: next_state = grant_cpu_a; 
2'b01: next_state = grant_cpu_a; 
2'b10: next_state = grant_prc_a; 
2'b1 1: next_state = grant_cpu_a; 
endcase 
end 


default: Sdisplay('state error in module arbiter"); 
endcase 
end 


endmodule 


Be MEMORY 
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* RANDOM ACCESS MEMORY 
* Filename: memory.v 
* Author: Joseph R. Robert, Jr. 


oo 


* Date: 24AUG95 

* Revised: 1OJAN96 

x 

* Purpose: This module emulates the system's main memory. For simulation 

* efficiency, the memory has only enough physical address space for four burst 

* reads. Thus, 12% bytes. The address bus width allows a virtual address space 

* of 4 G-bytes. Accesses to addresses past the first 128 bytes map to within 

* the first 128 bytes. 

* The time required for memory accesses are determined by Delay] and 

* Delay2. Delay! is the delay, in cycle, required for the initial access. 

* Delay2 is the delay required for each successive beat of four-beat 

* operations. Set them both to O for fastest memory response. Set them to 8 

* and 3 respectively for realistic memory response of a60 ns DRAM. Do not set 

* Delay2 > Delayl. That will not represent a realistic memory response, and 

* will probably cause this module to act weird. 

* There is a two-stage pipeline involved with memory accesses, such that a 

* memory tenure can be started while the previous data tenure is still active. 

* To accomplish this, some signals have [0:1] in their declaration, and are 

* indexed using pp and dpp, which are the address pipeline position pointer, 

* and the data pipeline position pointer, respectively. 

* To keep this model simple, a single-beat read will always return a 

* single byte of data, regardless of TSIZ, in byte lane 0, which is different 

* from the way the PowerPC really operates. See Table 10-4 on pg. 10-15 of 

* the PowerPC-603 Users Manual for actual alignment. This simplification is 

* irrelevant to the performance of the PRC which deals only with burst 

* Operations. 

* It is important to note that this memory module had to have one feature 

* that is not typical of memory modules. It has a CANX input with cancels the 

* current read operation. It is through this signal that the PRC stops the 

* memory module from delivering data to the CPU when the PRC already has the 

* data. 

* 
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module memory (ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_.WT_,CSE,AACK_., 
DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,TEA_,CANX,clk); 


// Signals are defined in system.v. 

output AACK_,DBDIS_,TA_,APE_; 
input [0:1] TC; 

input DBWO_,CI_,WT_,CSE,TEA_,DPE_,CANX,clk; 
input [0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

input [0:4] TT; 

inout [0:3] AP; 

inout [0:2] TSIZ; 

inout ABB_,TS_,TBST_,GBL_,DBB ; 


wire [0:31] A; 
wire CI_,WT_,CSE,TEA_,DPE_,ARTRY_: 


0 


ree AACK_,APE_,DBDIS_,DRTRY_; 
tri [0:63] D; 
met 0:7 | DP; 
tri [0:3] AP; 
tO: 2 SIZ: 
tri ABB_,TS_,TBST_,GBL_,DBB_,TA_; 
reg [0:63] d_reg, data; 
assign D = d_reg; 
Hf 
//Declare vanables. constants, parameters 
parameter TRUE = I'bl, 


FALSE = 1'b0, 
Ar = 1 bl, 
low = 1'b0, 


Size = 128, //Size of memory in bytes. 

Length = 7, //Length of physical address in bits. 
Delayl =8, //Delay for address translation. 
Delay2 = 3; //Delay between successive beats. 


parameter //for Transfer_type 


none = 5'bzz22Zz, 

write = 5'b00010, 
write_atomic = 5'b10010, 
read = >» b010TU: 


read atomic § = 5'b1 1010, 
burst_write = 5'b00110, 
burst_read =5'b01110, 
burst_read_atomic = 5'b11110; 


reg [0:31] virtual_addr, index; 
reg [0:3] addr_parity_calc,addr_parity_ in; 
reg [O:Length-1] pa_reg, physical_addr [0: 1}; 
reg [0:7] 
mem [0:Size-1], 
mem_reg; //Memory data register 
reg [0:4] Transfer_type [0:1]; 
nee [0:2] Deanstersize (0:1 |; 
reg burst [0:1]; 
reg [0:1] 1, burst_start; 
reg pp,dpp; //current pipeline and data pipeline positions 
reg abort; 
reg ta_reg_; 
assign TA_ = ta_reg_; 


/fnitialize memory 
initial 
begin 
abort <= FALSE; 
AACK_ <=hi:; 
addr_parity_calc <= 3'bz; 


o 1 


addr_parity_in <= 3'bz; 

DE DIS] 4— ii: 

ta_ree. <='bz; 

d_reg <= 64'bz; 

Transfer_type[O] <= none; 

Transfer_type[1]] <= none; 

Transfer_size[O] <= 'bz; 

Uiansteresize| | |i<— bz: 

burst[Q] <= 'bz; 

burst[1] <= ‘bz: 

pp <= 1'bl; 

dpp <= 1'b1; 

for (index = 0; index<Size; index=index+1) 
mem|[index] = index; 


end 
// 
//[ADDRESS TENURE 
always @(posedge clk) 
begin 
if (ABB_ == low) 
begin 
/flatch address and attributes 
PP = ~pp; 
Transfer_type[pp] <= TT; 
Transfer_size[pp] <= TSIZ; 
burst[pp] <= TBST_; 
//insert other attributes here. 
addr_parity_in <= AP; 
virtual_addr = A; 
addr_parity_calc[O] <= ~4virtual_addr[0:7]; 
addr_parity_calc[1] <= ~4vurtual_addr[8:15]; 
addr_parity_calc[2] <= ~4virtual_addr[16:23]; 
addr_parity_calc[3] <= ~Avirtual_addr[24:31}; 
physical_addr[pp] = virtual_addr[32-Leneth:3 1]; 
if (addr_panty_in != addr_panty_calc) 
begin 
$display(""Memory: address parity error."); 
$display(" Calculated parity: %b",addr_parity_calc); 
Sdisplay(" Recevied parity: %b",addr_parity_in); 
end 
AACK_ = #7 low; 
wait (AACK_==hi); 
end 
end 
always @(posedge clk) 
begin 
if (AACK_ == low) 
AACK S87 Ins 
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end 


DATA TENURE 
always @(posedge clk) 
begin 
if (CANX == hi) 
abort = TRUE; 
end 


always 
begin 
#1 dpp = ~dpp; 
#1 case (Transfer_type[dpp]) 
none: begin end 


read: 
begin 
repeat(Delay1)@(posedge clk); 
#7 ta_reg_ <= low; 
d_reg{0:7] <= mem[physical_addr{dpp]]; 
Transfer_size{dpp] <= ‘bz; 
@(posedge clk) 
Transfer_type[dpp] <= none; 
#/ ta _reg_ = bz: 
deres|(U:7 | <— bz; 
end 


write: 
begin 
repeat(Delay1)@(posedge clk); 
#7 ta_reg_ <= low; 
@(posedge clk) 
/flatch data 
data = D; 
mem/([physical_addr[dpp]] <= data[0:7]; 
Hf tacreg = bz; 
Transfer_size[dpp] <= 'bz; 
Transfer_type[dpp] <= none; 
end 


burst_read: 
begin 

//find critical double-word 
#2 pa_reg = physical_addr(dpp]; 
burst_start = pa_reg[Length-5:Length-4]; 
//align to cache line 
pa_reg[Length-5:Length-1] = S'bOO000; 
physical_addr(dpp] = pa_reg; 
if (!abort) if (Delay1-Delay2-1 >= 0) 


us 


repeat( Delay 1-Delay2-1)@(posedge clk); 


for (index=0; index<4; index=index+1) 
begin 

if (abort) repeat(Delay2)@(posedge clk); 

if (Delay 1-Delay2!=0 I! index!=0) @(posedge clk); 

if (!abort) begin 
#7 ta_reg_ <= low; 
1 = burst_start+index; //i is mod 4 
d_reg[ 0: 7])<=mem[physical_addr[dpp]+8*1]; 
d_reg[ 8:15]<=mem[physical_addr[dpp]+8*i+ 1); 
d_reg[16:23]<=mem[physical_addr[dpp]+8 *1+2]; 
d_reg[24:3 1]<=mem[physical_addr[dpp]+8 *i+3]; 
d_reg[32:39]<=mem[physical_addr[dpp]+8 *i+4]; 
d_reg[40:47]<=mem|[physical_addr[dpp]+8 *i+5]; 
d_reg[48:55]<=mem[physical_addr[dpp]+8 *1+6]; 
d_reg[56:63]<=mem|[physical_addr[dpp]+8 *1+7]; 
if (Delay2!=0) 

begin 
ta_reg. <= #13 ‘bz; 
d_reg <= #13 64'bz; 
end 

end 

else 
index <= 5: 

end 


@(posedge clk) 
ta_reg_ <= #/ bz; 
d_reg <= #7 64'bz; 
Transfer_size[dpp] <= ‘bz; 
Transfer_type[dpp] <= none: 
abort <= FALSE; 

end 


burst_write: 
begin 
/fourst-writes are always performed in order 
if (Delayl-Delay2 >= 0) 
repeat(Delay1-Delay2)@(posedge clk); 
for (index=0; index<4; index=index+1) 
begin 
repeat(Delay2)@(posedge clk); 
#7 ta_reg_ <= low; 
1 = index; 
@(posedge clk) //latch data 
data = D; 
mem|[physical_addr[dpp]+8*1] <= data[ 0: 7]; 
mem|[physical_addr[dpp]+8*i+1] <= data[ 8:15]; 
mem|[physical_addr[dpp]+8*i+2] <= data[16:23]; 
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mem|[physical_addr[dpp]+8*1+3] <= data[24:31]; 
mem([physical_addr(dpp]+8*1+4] <= data[32:39]; 
mem ([physical_addr(dpp]+8*1+5] <= data[40:47]; 
mem([physical_addr[dpp]+8*1+6] <= data[48:55]; 
mem|[physical_addr[dpp]+8*i+7] <= data[56:63]; 
if (Delay2!=0) 
ta_rege_ <=#7 ‘bz; 
end 

ta_reg_ <= #7 ‘bz: 

data <= #7 64'bz; 

Transter_size[dpp] <= ‘bz; 

Transfer_type[dpp] <= none; 

@(posedge clk); 

end 


default: $display("Memory module received bad TT[%d] = %b",dpp, 
Transter_type[dpp]," at time %d", Sume); 
endcase 
end 


endmodule 


oS 





APPENDIX 


C. PRC BEHAVIOR FILES 


The files in this appendix are the 


behavioral design phase. 


They include the verilog behavioral 


models of the PRC and the testing results. 


located on the Computer Center system at joshua_u2/jrrobert/ 


thesis/verilog/behavior. 


A. PRC 
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* Predictive Read Cache 
eillemnie: prc.V 

* Author: Joseph R. Robert, Jr. 
Paes WZ0CTY5 

* Revised: LOJAN96 


bd 


* Purpose: This module emulates the predictive read cache. 


* 
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medilepre er U BREBR_ BG _ABB_.[S_.A.AP,APE_,TT.TSIZ.TC.TBST_.AACK_. 


DEGe DEES DP.DPE_.TA_HRESET_,CANX.clk); 


// Signals are defined in system.v. Notations follow conventions used in 


// PowerPC Users Manual. 


input CPU_BR_,BG_,AACK_,DBG_,TA_,HRESET_.,clk; 


output [0:1] TC; 

output BR_,APE_,DPE_,CANX; 
inout {0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

inout [0:4] TT: 

inout [0:3] AP; 

inout [0:2] TSIZ; 


inout ABB_,TS_,TBST_,DBB_: 


wire [0:1] TC: 
wire BR_,APE_,DPE_.CANX; 
wire [0:31] A: 
wire [0:63] D; 
wire {0.7} DP; 


oF 


The files are 


wire [0:4] TT; 

wire [0:3] AP; 

wire, [U2 is ize: 

wite ABBa TS 2 TBS)2 DEB. 


//declare variables, constants, parameters 
parameter TRUE = Ibl, 
FALSE = bt) 
hi =i bl, 
low = Pb; 


/Other internal control signals 

wire CAR_latch, predict,snoop_ignore; 

wire [0:255] DATALINE; 

wire [0:26] CAR; // current address register 

wire [0:26] NAR; // next address register 

wire [0:26] MRMA; // most recent memory access 
wire [0:6] ActiveLine; 

wire [0:1] BURSTSTART; 


//Connect parts 

bus_interface BIUI(NAR,BURSTSTART,BG_,CPU_BR_,AACK_,DBG_., 
send. fetch,clk, BR_.upload.download.fetch_done, 
send_done. CANX,snoop_ignore, 
DATALINE.D.A.DP.DPE_T) 1sIZ ABB UTS TBS) 2 De Be TAS nese ie: 

snooper SNP1(A,AP.TT.TC.TS_.snoop_ignore,hold.clk. CAR.BURSTS TAR Tyread.w rite); 

controller CON 1 (HRESET | read.writehit.send_ done letch_done, 
ling_empty.a_select.test.predict.store, 
Mush.send.holdsnew_replace etch clk): 

predictor PREI(MRMA,CAR, predict. NAR); 

line_mgr LMI(CAR.NAR,HRESET_.a_select.test.fetch_done.tlush.store, 
new_replace,MRMA,ActiveLine,line_empty, hit); 

datalist DLICDATALINE, ActiveLine,upload,download); 


endmodule 
Be CONTROLLER 
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* CONTROLLER 

* Filename: controller.v 

* Author: Joseph R. Robert, Jr. 

* Date; 21 DECYS 

* Revised: OSJANIG 

ik 

* Purpose: This module 1s a Finite State Machine which coordinates the actions 
© of all the other Tunctional blocks of the PRC. All control signals are 


whe 


* synchronous with the system clock. HRESET_ causes the Controller to go to 


* the IDLE state. See state diagram and state outpul tables. 
* 


ok fe hose fe ke ee ef ese ose ef foe ef fe fe 8 fe fe foe ee kf fe Re ff ke oe tf fe he ff Ree hf fe ea fe fe oft ee feo ae ase: 
module controller (HRESET_.read.write,hit.send_done,fetch_ done, 
line_empty.a_select.lest.predict.store, 
Mush send.holdnew_replace.fetch clk); 


Inpul HRESET_.read. write hit,send_done,fetch_done,line_empty,clk; 
outpul it_select test. predictstore.Aush send holdnew_replace fetch: 


rev a_select.test predict.store.flush send hold.new_replace fetch; 


//eclare variables, constants, parameters 
Pirameter TRUE = |b. 
FALSE = !'b0, 
fie — 1 DI, 
low = I'b0, 
trace = FALSE; 


//Finite State Machine vanable and parameters 
reg [0:3] state, next_state; 
reg [0:2] inputs3; 
rey [Oz 1] inputs2; 
reg inpull; 
puruneter idle =, 

test car p= |. 
send cat = 2, 
fee = 3. 
fetch_data = 4, 
is_line_empty = 5, 
predict_na = 6, 
Shope scl — 7. 

ics eur w= &. 
Mush_line = 9; 


/Anitialize signals 


mii 
begin 
state <= idle: /{The state variables must be initialized to 
next_state <= idle; — //avoid the default error message. 
end 
/[FINITE STATE MACHINE 
always @(negedge HRESET_) 
hevin 


Site = 1d/c: 
next_stite <= idle: 
Wallies ihn == Nil): 


end 


a9 


alwiys 
begm 
fo Male = NeXt Sie: 
if (trace) 
Sdisplay(’ Controller entered state Zod." state): 


#1 cuse (stale) 


idle: //O 
begin 
Jiacselech <= low: 
lest = low; 


predict <= low; 
store <= low; 
flush <= low! 
send <= OW 
hold =< — 1M 
new_replace <= low; 
fetch <= 10W. 
@(posedge clk) inputs2 = {read.write }; 
if (HRESET 5== ow) 
next_state = tdle; 
else 
case (inputs?) 
2bUu0; nexiasiitce=—icle 
2H neXt Sle Stes car, we 
2 PO: Next oS( ea meste cin ae 
J'bL tL: next_stite = test_ear_w; //This should not happen. 
CHdcusc 
end 
(CS Caries 


beyin 
a_select <=low;: /CAR 
lest ee) | TE 
predict <=low; 
store =1OWw: 
flush <= OW. 
send <= low. 
hold <= lik 
new_replace <= low; 
fetch = low. 


@(posedge clk) input! = hit: 
case (inpull) 
lbO: next_state = ts_line_emply: 
‘bl: next_state = send_data: 
cndcase 
end 


send data: 7/2 


bewin 
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/fa_select <= low; 

test <= |Ow; 

predict <=hi; 

store <= low; 

flush <= low; 

send <= ini; 

hold <= hi: 

new_replace <= low; 

fetch <=low; 

@(posedge clk) input1 = send_done; 

Case (input) 
1'bO: next_state = send_data; 
l'bl: next_state = test_nar; 

endcase 

end 


test_nar: //3 
begin 
a_select <=hi; /NAR 
test <= il 
predict <=low; 
store <= low; 
flush <= low; 
send <= |OW: 
hold <= fi: 
new_replace <= low; 
fetch <= 1OW: 
@(posedge clk) inputs3 = {hit,read, write}; 
case (inputs3) 
3'b000: next_state = fetch_data; 
3'b001: next_state = idle: 
3'b010: next_state = idle; 
3'b011: next_state = idle; //This should not happen. 
3'b100: next_state = idle; 
3'b101: next_state = idle; 
3'b110: next_state = idle; 
3'b111: next_state = idle; //This should not happen. 
endcase 
end 


fetch_data: //4 
begin 
a_select <=hi; /INAR 
test <—— ON 
predict = <— OW. 
store <= low; 
flush =—— low: 
send <= low; 
hold <= hi: 
new_replace <= low; 


1G a 


fetch <= hi: 
@(posedge clk) input = fetch_done; 
case (input1) 
1'bO: next_state = fetch_data; 
1'b1: next_state = idle; 
endcase 
end 


is_line_empty: //5 
begin 
//fa_select <=low; 
test <= low; 
predict <=low; 
store <= low; 
flush =—— low. 
send <= low; 
hold <= ite 
new_replace <= low; 
fetch <I Owe 
@(posedge clk) input1 = line_empty; 
case (input1) 
1'bO: next_state = predict_na; 
l'bl: next_state = store_car; 
endcase 
end 


predict_na: //6 
begin 

/la_select <=low; 
test <= 10W- 
predict <=hi; 
store <= lOW: 
flush <= low; 
send <— 1Oiy; 
hold <= he 
new_replace <= hi; 
fetch <= |OW: 
@(posedge clk) next_state = test_nar; 


end 
store_car: //7 
begin 
a_select <=low; //CAR 
test <= low; 


predict <=low; 
store <= hi: 

flush <= low: 
send <= OW: 
hold <= ht: 
new_replace <= low; 


db O12 


fetch <= low; 
@(posedge clk) next_state = idle; 
end 


test_car_w: //8 


begin 
a_select <=low; /CAR 
test es) pS 


predict <= low; 
store <= low; 
flush <= low: 
send <= low: 
hold <= hie 
new_replace <= low; 
fetch <=low; 
@(posedge clk) input] = hit; 
case (input!) 
1'bO: next_state = idle; 
I'bl: next_state = flush_line; 
endcase 
end 


flush_line: //9 

begin 

Hfa_select <=low; 
test <= low; 

predict <=low; 
store <= low; 


flush <=): 
send <= 10W: 
hold <= hie 


new_replace <= low; 

fetch <=low; 

@(posedge clk) next_state = idle; 
end 


default: 
begin 
$display(‘'state error in module controller."); 
$display(" state = %b.",state); 
end 


endcase 
end 


endmodule 
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Cx SNOOPER 
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* SNOOPER 

* Filename: snooper.v 

* Author: Joseph R. Robert, Jr. 

* Date; = 2) DEG95 

* Revised: OSJAN96 

* Purpose: This module watches the system bus activity, and makes appropriate 
* reports to the PRC Controller. 

* Jf the transaction 1s a data burst read or any kind of wnite, and if the 

* address parity is correct, then the read or write signal is asserted as 

* appropriate, and the address is placed in the CAR. The snoop_ignore signal 
* tells this unit to ignore the current transaction, because it was initiated 

* by the Bus Interface Unit. The snoop_ignore signal must be asserted 

* concurrently with the transfer attributes. 

* Reads that are not burst or data related are ignored by the PRC. The CAR 
* ig updated only on transactions relevant to the PRC. 

* Due to the two-stage pipelining capability of the PowerPC, with respect to 
* memory accesses, a second address tenure can occur shortly after the first, 

* well before the first data tenure is complete. To compensate for this, the 

* read and write outputs of the Snooper will remain exerted until acknowledged 
* by the Controller with hold. The rising edge of hold indicates that the read 

* or write signal was received by the Controller. The Snooper can then negate 
* these signals, but must leave CAR alone until hold is negated. After hold is 
* negated, CAR can be updated to the new address. 


* 
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module snooper (A,AP,TT,TC,TS_,snoop_ignore,hold,clk, CAR,BURSTSTART, 
read_flag,write_flag); 


input [0:31] A; 

input [0:3] AP; 

input [0:4] TT: 

input [0:1] TC; 

input TS_,snoop_ignore,hold,clk; 
output [0:26] CAR; 

output [0:1] BURSTSTART; 
output read_flag,write_flag; 


reg [0:26] CAR; 
reg [0:1] BURSTSTART; 
reg read_flag,write_flag; 


//declare variables, constants, parameters 


parameter TRUE = I1'bl, 
FALSE = 160. 
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Mee 1 bl: 
low =) BO: 


//Address related 
reg [0:31] address: 
reg [0:3] addr_parity,addr_parity_calc; 


//Other external control signals 

reg [0:4] Transfer_type; 

parameter //for Transfer_type 
none = iDZ. 
write = 5'b00010, //02 
write_atomic = 5'b10010, //12 
read = 5'b01010, /JOA 
read_atomic =5'b11010, //1A 
burst_write = 5'b00110, //06 
burst_read =5'b01110, //OE 
burst_read_atomic = 5'b11110; //1E 

reg (0:1] Transfer_code; 

parameter //for Transfer_code 

data_transfer =2'b00, 


touch_load = 2 boll, 
instruction_fetch = 2'b10, 
reserved =p bel: 


reg ignore; 


//Other internal control signals 

reg valid_read_O, valid_read_1; //The numbers indicate the pipeline stage. 
reg valid_write_O, valid_wnite_1; 

trl parity_valid; 

reg Transaction_waiting; 


/Anitialize variables 
initial 
begin 
CAR <= 27bz: 
BURSTSTART <= 2'bz; 
read_flag <= low; 
write_flag <= low; 
address <= 32'bz; 
addr_parity <= 4'bz; 
addr_parity_calc <= 4'bz; 
Transfer_type <= none; 
Transfer_code <= none; 
ignore <= low; 
Transaction_waiting <= low; 
end 


/[BEHAVIOR 


AGS 


//Calculate address panty. 
always @(address) 
begin 
addr_parity_calc[O] <= ~address[0:7); 
addr_parity_calc[1] <= ~address[8:15]; 
addr_parity_calc[2] <= ~Aaddress[16:23); 
addr_parity_calc[3] = ~/address(24:31); 
end 


assign panity_valid = (addr_parnty_calc == addr_parity); 


//lf there is a transaction, 

// and that transaction is a data burst read or any kind of write 
// and the transaction is not initiated by the PRC itself, 

// and if the address parity 1s correct 

//then report the type of transaction to the Controller. 


always @(posedge clk) 
begin 
in Poe low) 
begin //latch address and attnbutes in stage 0. 
address <= A; 
Transfer_type <= TT; 
Transfer_code <= TC; 
ignore <= snoop_ignore; 
addr_parity = AP; 
#2 valid_read_O = Transfer_code == data_transfer & 
(Transfer_type = burst_read | 
Transfer_type == burst_read_atomic); 
valid_write_O = Transfer_type = wntte | 
Transfer_type = wnite_atomic | 
Transfer_type == burst_wnite; 
#4 if (!ignore & parity_valid & (valid_read_O | valid_write_0)) 
Transaction_waiting = hi; 
end 
end 


always @(posedge hold) 
begin 
read_flag <= low; 
write_flag <= low; 
end 


always 
begin 
wait(Transaction_waiting); 
valid_read_l =valid_read_0; 
valid_write_1 = valid_wnte_0; 
Transaction_waiting = #2 low; 


IOs 


wait(!hold); 
if (valid_read_1) 
begin 
read_flag <= hi; 
CAR = address[0:26]; 
BURSTSTART = address[27:28]; 
end 
else if (valid_write_1) 
begin 
write_flag <= hi; 
CAR = address[0:26]; 
BURSTSTART = address[27:28]; 
end 
end 


endmodule 


D.. LINE MANAGER 
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* LINE MANAGER 

* Filename: line_megr.v 

* Author: Joseph R. Robert, Jr. 

Peace: 21 DEC95 

* Revised: OSJAN95 

* 

* Purpose: This module contains the address list, status flags for each line 

* (Valid, Aged), a general status flag (line_empty), the line replacement unit, 

* and a couple of pointers (ActiveLine, ReplaceLine). 

* The MRMA output 1s always the MRMA of the ActiveLine. The line_empty 
* flag indicates that the currently active line has no addresses in it yet, and 

* therefore, cannot be used by the PRC to make a prediction. 

* The input a_select determines which address input is used for a particular 

* operation. The two address inputs are the CAR and the NAR. 

* When the Line Manager receives a test signal, it compares the input address 
* with the contents of the PredMA List. If there is a match with the CAR, it 

* asserts the hit signal, and changes the ActiveLine pointer to the line number 
* of the match. 

* If there is amiss with the CAR, then the ActiveLine switches to the same 

* line pointed to by ReplaceLine. 

* If, during a test, there is a match with the NAR, hit is asserted, and the 

* value in ActiveLine is irrelevant since it will not be used. If there is a 

* miss with the NAR, the ActiveLine must remain unchanged from the test. 

* The fetch_done signal from the Bus Interface Unit causes the NAR to be 

* stored in PredMAf[ActiveLine], the CAR to be stored in MRMA[ActiveLine], the 
* Valid flag to be set, and the Aged flag to be reset. 

* The flush signal causes the current ActiveLine to become invalid by setting 


Hy: 


* Valid{ActiveLine] = 0. 
* The store signal causes the input address to be stored into the MRMA of the 
* ActiveLine. This is only used for the first address in a new line. Store 


* also causes the line_empty flag to be reset. 
_ 
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module line_mgr (CAR,NAR,HRESET_.a_select.test.fetch_done.flush,store, 
new_replace, MRMA_out,ActiveLine,line_empty,hit); 


input [0:26] CAR,NAR; 

input HRESET_,a_select,test,fetch_done,flush,store,new_replace; 
output [0:26] MRMA_ out; 

output [0:6] ActiveLine; 

output line_empty,hit; 


reg [0:26] MRMA_out; 
reg [0:6] ActiveLine; 
reg line_empty,hit; 


//declare variables, constants, parameters 
parameter TRUE = 1'bl, 
FALSE = ibv: 
hi = l'bl, 
low =1'b0; 


//Address related 
reg [0:26] in_addr; 


//Data structure 
reg [0:26] PredMA [0:127], 
MRMA {0:127], 
PredMA_reg,MRMA_ reg; 
reg Valid [0:127], 
Aged (0:127]; 


//Other internal control signals 

Tee (0 ldieie ao. 

reg {0:6} ReplaceLine; 

reg match,temp,all_lines_are_valid,done; 


//initialize variables 
initial 
begin 
for Gi=0: 11<=127si1=i1-41) 
begin 
PredMA[il] <= 27'b0; 
MRMA{il] <= 27'b0; 
end 
end 


OS 


/BEHAVIOR 


always @(negedge HRESET_) 
begin 
for (11=0; 11<=127; 11=114+1) 
begin 
Valid[il] <=low; 
Aged[i1}] <= low; 
end 
ActiveLine <=(Q; 
ReplaceLine <= 0; 
line_empty <= hi; 
ae RESET _ == hi); 
end 


always @(a_select or CAR or NAR) //address multiplexer 


begin 
if (a_select==0) 
in_addr = CAR; 
else 
in_addr = NAR; 
end 
always @(ActiveLine) 
begin 


MRMA_out = MRMA[ActiveL ine]; 
$display("Line_mgr selected new ActiveLine = %d at $d",ActiveLine,$time); 
end 


always @(posedge test) 
begin 
hit =low; 
match = low; 
#2 i2 = 0; 
while (!match & 12<128) 
if (PredMA[i2] == in_addr & Valid{i2]) 
maich = hi; 
else 
2 eel 
#2 if (match & a_select==0) //a match with the CAR 
begin 
hit <= hi; 
ActiveLine <= 12; 
end 
else if (match & a_select==1) // a match with the NAR 
hit <= ihie 
else if (!match & a_select=0) //a miss with the CAR 
ActiveLine <= ReplaceLine; 
else if (‘match & a_select==1) //a miss with the NAR 
begin end// Do nothing. 


Tee, 


end 


always @(posedge fetch_done) 

begin 
MRMA[ActiveLine] <= CAR; 
MRMA_ out <= CAR; 
PredMA[ActiveLine] <= NAR; 
Valid[ActiveLine] <= hi; 
Aged[ActiveLine] = low; 

end 


always @(posedge flush) 
begin 
Valid[ActiveLine] = low; 
$display(Line manager flushed line %d at time %d.",ActiveLine,$time); 
end 


always @(posedge store) 
begin 
MRMA[ActiveLine] = in_addr; 
MRMA_out = MRMA[ActiveLine]; 
line_empty = 0; 
end 
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* LINE REPLACEMENT UNIT 

* 

* ReplaceLine always points to the line to be replaced at the next PRC miss. 
* As soon as the PRC starts predicting the first address for a line it 

* asserts new_replace, and the Line Replacement Unit can then find a new line 
* to mark as the next ReplaceLine. It searches sequentially for the next line 

* with invalid data and marks that line as the next to be replaced. If all 

* lines contain valid data, then it scans for the next line that is “aged”, 

* indicated by a set Aged flag. As it scans for an aged line, it sets the Aged 

* bits in the lines it passes. Therefore, as it wraps around in search of an 

* aged line, it will eventually come upon one, even if none were aged when the 
* search began. 

* All of this occurs while the PRC is fetching data, so it has several clock 


* periods in which to complete the search. 
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always 
begin 
temp = TRUE; 
[Or (13=07 14<=1277 13=19- > 
if (! Valid [i3]) 
femip— i Aes: 
#1 all_lines_are_valid = temp; 
end 
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always @(posedge new_replace) // find the next ReplaceLine 
begin 
done = FALSE; 
#2 while (!done) 
begin 
ReplaceLine = ReplaceLine + 1; //mod 128 addition 
if (! Valid[{ReplaceLine]) 


done = TRUE; 

else if (all_lines_are_valid & Aged[ReplaceLine]) 
done = TRUE; 

else 
Aged[ReplaceL ine] = 1; 

end 
line_empty = hi; 
end 
endmodule 


E. PREDICTOR 


Sac arate te A 2 tee ACA ie ake 0 ob Se fe oh 2s 62s 2 i 2s 6 Ae he 2 8 fe 2s 2 be 2 2 2 ke 2 2 2 2 ee a Ae aR A Ae 2 


* PREDICTOR 

* Filename: predictor.v 

* Author: Joseph R. Robert, Jr. 

wea: 21DEC95 

* Revised: OS5JAN96 

4 

* Purpose: This module calculates the Next Address (stored in NAR) based on the 
* Most Recent Memory Access (MRMA) and the Current Address (in the CAR). The 
* prediction calculation 1s 

> 

* NAR =2*CAR - MRMA 

* 

* The calculation is initiated upon each rising edge of the predict signal. 

* The output NAR remains latched and valid until the next predict leading edge. 


*k 
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module predictor (MRMA,CAR, predict, NAR); 
input [0:26] MRMA,CAR; 

input predict; 

output [0:26] NAR; 

reg [0:26] NAR; 


parameter TRUE = 1'bl, 
FALSE = 1'b0, 


trace = FALSE; 


// behavior 
always @(posedge predict) 
begin 
NAR = 2*CAR - MRMA; 
if (trace) 
begin 
$display("Predictor: NAR =2*CAR = -MRMA"); 
Sdisplay("' Joh = 2*%h - Sh"=,{ NAR,5'bO},{CAR,5'bO},{ MRMA,5'b0} ); 
end 
end 


endmodule 


Fee DATA LIST 
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* DATA LIST 

* Filename: datalist.v 

* Author: Joseph R. Robert, Jr. 
* Date; 5 PEG?s 

* Revised: OSJAN96 


sd 


* Purpose: This module emulates the PRC's Data List. 

bs 

* An upload signal causes the Data List to store the data on data_line into 

* the address specified by ActiveLine. 

* A download signal causes the Data List to assert onto data_line the data in 
* the address specified by ActiveLine. 


* 
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module datalist (data_line,ActiveLine,upload,download); 
input [0:6] ActiveLine; 
input upload,download; 
inout [0:255] data_line; 


tr [0:255] data_line:; 


//declare vanables, constants, parameters 
parameter TRUE = l’bl, 


FALSE = I'b0: 
Die ol 
low = fb: 


trace = TRUE: 


eZ 


//Data structure 
fee |0:255) line (0:127], 
line_reg, 
data_line_reg; 
assign data_line = data_line_reg; 


/Anitialize signals 
initial 
begin 
data_line_reg <= 256'bz; 
end 


/(BEHAVIOR 
always @(posedge upload) 
begin 
line_reg = data_line; 
line{[ActiveLine] = line_reg; 
if (trace) begin 
$display("DATALIST uploaded this data into line %h at time %d.", 


ActiveLine, $time): 
$display(" %h",line_reg); 
end 

end 


always @(posedge download) 
begin 
line_reg = line[ActiveLine]; 
data_line_reg = line_reg; 
if (trace) begin 
$display("DATALIST downloaded this data from line %h at time %d.", 


ActiveLine, $time); 
$display(" 9%h",line_reg); 
end 

end 


always @(negedge download) 
begin 
data oimesres = 256 bz; 
end 


endmodule 


ces 
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* BUS INTERFACE UNIT 

* Filename: bus_interface.v 

* Author: Joseph R. Robert, Jr. 

+ Date: 090E 195 

* Revised: OSJAN96 

* Purpose: This module connects the PRC with the system bus. It handles 

* the protocol of data transfer in and out of the PRC. 

* When this module received a fetch signal, it latches the address in the 

* NAR, and requests the bus for a burst read. It stores the incoming data 

* until all four bursts have been received. Then it uploads the data into the 

* Data List and assserts fetch_complete. 

* When this module receives a send signal, it sends a cancel signal (CANX) to 
* the memory module, downloads data from the Data List, and then sends the data 
* to the CPU. When the transfer is finished, it asserts send_done. 

* The coordination of these activities 1s accomplished through the use of a 

* Finite State Machine. 


* 
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module bus_interface (NAR_IN,BURSTSTART,BG_,CPU_BR_,AACK_,DBG_, 
send. fetch,clk, BR_,upload,download,fetch_done, 
send_done, CANX,snoop_ignore, 
DATALINE,D,A,DP,DPE_.TT,TSIZ,ABB_,TS_.TBST_,DBB_,TA_,HRESET_); 


// Signals are defined in system.v. 

input [0:26] NAR_IN; 

input [0:1] BURSTSTART; 

input BG_,CPU_BR_,AACK_,DBG_,send.fetch,clk, HRESET_; 
output BR_,upload,download,fetch_done; 
output send_done, DPE_,CANX,snoop_ignore; 
inout [0:255] DATALINE; 

inout [0:63] D; 

inout [0:31] A; 

inout [0:7] DP; 

inout [0:4] TT; 

inout [0:2] TSIZ; 

inout ABB_,TS_,TBST_,DBB_,TA_; 


reg BR_,upload,download,fetch_done,send_done,CANX,snoop_ignore; 
tri [0:255] DATALINE; 

tri [0:63] D; 

tri [0:31] A; 

tri (0:7] DP: 

tri [0:4] TT; 

ir [O: Ae; 
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Pom lolz: 
tri ABB_,TS_,TBST_,DBB_,TA_,DPE_; 


//declare variables, constants, parameters 
parameter TRUE = 1'bl, 
FALSE = 1'b0, 
a bl 
low = 1'b0, 
inaee = Dixie: 


//Address related 

reg [0:31] NAR; 

Bec sol] a_reg; 
assign A = a_reg; 

reg [0:3] ap_reg, addr_parity_calc; 
assign AP = ap_reg; 

reg [0:1] burst_start; 


//Data related 

reg [0:255] data_line_reg, data_line; 
assign DATALINE = data_line_reg; 

reg [0:63] d_reg,data_reg; 
assign D = d_reg; 

reg [0:7] dp_reg, data_parity_calc, data_parity_in; 
assign DP = dp_reg; 


/fOther external control signals 
reg [0:4] tt_reg, Transfer_type; 
assign TT = tt_reg; 
parameter //for Transfer_type 
none = 5'bz, 
burst_write = 5'b00110, //06 
burst_read =5'b01110, //OE 
burst_read_atomic = 5'b11110; //1E 
feo) tsiz-1eg; 
assign TSIZ = tsiz_reg; 
feerpoeiece Gbontee..Is ree _tbst_ree_,ta_reo_; 
assign ABB_ = abb_reg_; 
assign DBB_ = dbb_reg_; 
assign TS_ = ts_reg_; 
assign TBST_ = tbst_reg_; 
assign TA_ =ta_reg_; 


//Other internal control signals 

reg [0:2] 1; //counter 

reg [0:1] j; //counter 

wire qual_BG_,qual_DBG_; 

reg AB_ Master, Transfer_in_progress, Transfer_start,Addr_termination, 
Data_Parity_Error; 
assign DPE_ = ~Data_Parity_Error; 


iS 


event transfer_acknowledged,start_send; 


/fFinite State Machine variable and parameters 
reg [0:3] state, next_state; 
reg [0:1] inputs2; 


reg input]; 
parameter idle =0, 
fetehte— 1) 
fetene—2, 
fetch3 = 3, 
send] =5, 
send2 =6; 
//initialize signals 
initial 
begin 
BR_ <=hi: 


upload <= low; 
download <= low; 
fetch_done <= low; 
CANX <= low: 

NAR <= 32'bz; 

a_reg <= 32'bz; 

ap_reg <= 4'bz; 
addr_parity_calc <= 4'bz; 
burst_start <= 2'bz: 
data_line_reg <= 256'bz; 
data_line <= 256'bz: 
d_res <= 64'bz; 

data_reg <= 64'bz; 
dperes <= 8'bz, 
data_parity_calc <= 8'bz; 
data_parity_in <= 8'bz; 
eres <=) bz: 

[SIzZ_Teo <= 3'bz; 
dobeleom —— bz 
dbb_reg_ <= 'bz; 

ieee a — DZ: 

(bst_reg_ <= bz; 

ta_reg <= ‘bz; 

i= 9 OZ, 

27: 

AB_Master <= low: 
Transfer_in_progress <= low; 
Transfer_start <= low: 
Addr_termination <= low; 
Data_Parity_Error <= low; 
Sena done <= low: 
snoop_ignore <= low; 
Slate — (): 
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next_state <= Q; 
inputs2 <= 2'bz; 
input! <= 'bz; 
end 
//ADDRESS BUS ARBITRATION 
assign qual_ BG_=~(!BG_ & ABB_); 


//Assume mastership 


always @(posedge clk) 
if (qual_BG_ = low) 
begin 


abb_reg_ = #2 low; 
AB_Master = TRUE; 
BR <= #1 hi: 

end 


//Calculate address parity. 
always @(NAR) 
begin 
addr_parity_calc[O] <= ~\NAR[0:7]; 
addr_parity_calc[1] <= ~*NAR[8:15]; 
addr_parity_calc[2] <= ~\NAR[16:23]; 
addr_parity_calc[3] = ~*NAR(24:31); 
end 


/{Transfer address 
always @(posedge clk) 
if (qual_BG_ = low) 
begin 
ts_reg_ =#7 low; 
Transfer_start <= TRUE; 
a_reg <= NAR; 
ap_reg <= addr_parity_calc; 
tt_reg <= burst_read; 
tsiz_reg <= 3'b010; 
tbst_reg_ <= low; 
snoop_ignore <= hi; 
if (trace) 
$display("BIU started read from address %h at time %d.", 
NAR, $time); 
end 


always @(posedge clk) 
if (AB_Master & TS_==low) 
begin 
ts_reg_ =#7 hi; 
wait (AACK_ ==low);: 
Addr_termination = TRUE; 


ete, 


end 


//Address termination 
always @(posedge clk) 
if (Addr_termination) 
begin 
#7 ts_reg_ <= ‘bz; 
daiecueme <—) DZ, 
ap_reg <= ‘bz; 
ttreg <='bz; 
[sizes 6 bz; 
tbst_reg_ <= ‘bz; 
snoop_ignore <= low; 
//msert other addr transfer characteristics here. 
abb_reg_ <= #2 hi; 
abb_reg_ <= #8 'bz; 
AB_Master = FALSE; 
Addr_termination = FALSE; 
end 


/[DATA BUS ARBITRATION FOR FETCHES 
assign qual DBG_ = ~(!DBG_ & DBB_); 


always @(posedge clk) 
begin 
if (TA_ == low) 
-> transfer_acknowledged; 
end 


//calculate data parity. Odd parity, including parity bit. 
always @(data_reg) 
begin 
data_parity_calc[Q] <= ~Adata_reg[0:7]; 
data_parity_calc[1] <= ~4data_reg[8:15]; 
data_parity_calc[2] <= ~4data_reg[16:23]; 
data_parity_calc[3] <= ~Adata_reg[24:31]; 
data_parity_calc[4] <= ~4data_reg[32:39]; 
data_parity_calc[5] <= ~Adata_reg[40:47]; 
data_parity_calc[6] <= ~Adata_reg[48:55]; 
data_parity_calc[7] = ~Adata_reg[56:61 ]; 
end 


always 
begin 
//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_==low & Transfer_start); 
@(posedge clk) //assume data bus mastership 
dbb_reg_ <= #7 low; 


ks 


= 0). 
while (<4) 
begin 
@(transfer_acknowledged) //latch beat 
data_reg <= D; 
data_parity_in = DP; 
#2 if (trace) $display(" BIU: %h at %d",data_reg,Stime); 
#2 if (data_parity_in '= data_parity_calc) 
begin 
$display("BIU: data parity error."); 
$display("" Calculated parity: %b", 
data_parity_calc); 
Sdisplay(" Recevied parity: %b", 
data_parity_in); 
Data_Parity_Error = TRUE; 
i=4; 
end 
else 
begin 
if (i==0) data_line[ 0: 63] = data_reg; 
if (1==1) data_line[ 64:127] = data_reg; 
if G==2) data_line[128:191] = data_reg; 
i ——4) datasline[192:255]| =data_ree: 
1=i+1; 
end 
end 


Transfer_in_progress <= FALSE; 
Transfer_start <= FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 

end 


//[DATA BUS PROTOCOL FOR SENDS (PRC acting as memory module) 
always @(start_send) 
begin 
1=0; 
while (i<4) begin 
@(posedge clk); 
#1 ta_reg_ = ‘bz; 
j = burst_start+i; //j is mod 4 
if (j==0) data_reg = data_line[ 0: 63]; 
if G==1) data_reg = data_line[ 64:127]; 
if G==2) data_reg = data_line[128:191]; 
if (==3) data_reg = data_line[192:255]; 
d_reg = data_reg; 
#4 dp_reg <= data_parity_calc; 
ta_reg_ <=low; 
I=1+1; 
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end 

send done <— ii; 
@(posedge clk) 
ta_reo_ <=#7 ‘bz: 
d_reg <= #7 64'bz; 
dp_reg <= #7 8'bz; 


end 
/[FINITE STATE MACHINE 
always @(negedge HRESET_) 
begin 
if (HRESET_ = low) 
begin 
state <= idle; 


next_state <= idle; 
wait((HRESET_ = hi); 
end 
end 


always 
begin 
#2 state = next_state; 


#1 case (state) 
idle: //O 
begin 
upload <= low; 
fetch_done <= low; 
send_done <= low; 
CANX <= low; 
data_line_reg <= 256'bz; 
@(posedge clk) inputs2 = {send,fetch}; 
case (inputs2) 
2'b00: next_state = idle; 
2 DOM; Nextsstale = fete, 
2'b10: next_state = send1; 
2'b11: next_state =idle; //This should not happen. 
endcase 
end 


ference 7/1 
begin 
//i. Latch next address. 
NAR[0:26] <= NAR_IN; 
NAR[27:31] <= 5'b0; 
//2. Request Bus 
BR_ <= low; 
Transfer_in_ progress <= TRUE; 
@(posedge clk) 


next_state = fetch2; 
end 


feten2: 72 
begin 
//1. Wait for all data to be received. 
@(posedge clk) input] = Transfer_in_progress; 
case (input1) 
1'bO: next_state = fetch3;: 
1'b1: next_state = fetch?2; 
endcase 
end 


fetch3: //3 

begin 
//\. Upload the data hne. 
data_line_reg <= data_line; 
upload <= hi; 
//2. Assert fetch_done. 
fetch_done <= hi; 
@(posedge clk) 

next_state = idle; 
end 


send1: //5 

begin 
//1. Cancel the memory access. 
CANX <= hi: 
//2. Latch burst_start. 
burst_start <= BURSTSTART; 
//3. Download data from the data list. 
download <= hi; 
#5 data_line <= DATALINE; 
@(posedge cik) 

next_state = send2; 
end 


send?2: //6 
begin 
//1. Send data to CPU 
-> start_send; 
CANX <= low; 
download <= low; 
@(posedge clk) input! = {send_done}; 
case (input1) 
1'bO: next_state = send2; 
1'b1: next_state = idle; 
endcase 
end 
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default: 
begin 
Sdisplay(''state error in module bus_interface."); 
Sdisplay(" state = %b.",state); 
end 


endcase 
end 


endmodule 
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* Transaction Sequencer - Prediction Test 

* Filename: sequencer4.v 

* Author: Joseph R. Robert, Jr. 

* Date. 21DEC95S 

* Revised: OSJAN96 

6 

* Purpose: This is one in a set of modules which perform a sequence of CPU 
* transactions. This sequencer causes a series of CPU operations that provide 
* a comprehensive test of the PRC. It demonstrates a majority of the PRC's 

* capabilities, showing when the Line Manager selects new lines, when and how 
* the Predictor functions, when the CPU starts a read or write and the data 

* involved. It shows when the Bus Interface Unit fetches data from memory. 
* The DataList reports the flow of data in and out of it. The only significant 

* behavior not exercised by this test is the function of the Line Replacement 

* Unit when the PRC is full. That is handled with Sequencer #5. 

: 

* Sequence #4: 

* burst_read OOh 

* burst_read 20h - PRC should predict 40h and fetch data. 

* burst_read 180h - PRC should start a new line. 

* burst_read 1AQh - PRC should predict 1COh. 

* burst_read 40h - already in PRC, should predict 60h. 

* burst_write 1COh - should flush line. 

* burst_read 60h - already in PRC, predicts 80. 

* burst_read 100h - PRC should start a new line. 


*% 


* When using this sequencer, set all trace flags to TRUE (except the 
* Controller), and run the simulation for 6000 steps. 

4 

* General Timing instructions for all Sequencers: 

* Use an initial block for each transaction. You must ensure that the 
* following rules are adhered to: 
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1. Before the first transaction, use 

repeat(2)@(posedge clk) 

2. Before the first line of the second transaction, use 

wait(ABB_==low); 
wait(ABB_==hi); 

3. There can be only two transactions pipelined at a time. You must ensure 
manually that the first operation is complete before the third begins. 
When scheduling the current transaction, look at the transaction before 
last. Wait for that TA_ to finish. Also, wait for the ABB_ from the 
previous transaction to go high. 


* 4.A burst read takes 330 simulation time units = 22 clock cycles. 
*K 
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module sequencer(Transfer_size,clk,pp,address,data, line, Transfer_type, 
Transfer_code,need_bus_trigger_,ABB_ ); 


input clk, ABB_>; 

output pp,need_bus_tnigger_; 
output [0:31] address; 
output [0:63] data; 

output [0:255] line; 

output [0:4] Transfer_type; 
output [0:2] Transfer_size; 
output [0:1] Transfer_code; 
reg pp,need_bus_trigger_; 
reg [0:31] address; 

reg [0:63] data; 

reg [0:255] line; 

reg [0:4] Transfer_type; 
reg [0:2] Transfer_size; 
reg [0:1] Transfer_code; 


//declare variables, constants, parameters 
parameter TRUE = 1!'bl, 


FALSE = 1'b0, 
hiee= 1 bl, 
low = 1'b0; 


parameter //for Transfer_type 
none = DZ: 
write = 5'b00010, //Q2 
write_atomic = 5'b10010, //12 
read = 5'b01010, HOA 
read_atomic = 5'b11010, //IA 
burst_write = 5'b00110, //06 
burst readm=SibO1110, /0E 
purstercateatonie =Jb11110; //1E 
parameter //for Transfer_code 
data_transfer = 2'b00, 
touch_load = Ol. 
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instruction_fetch = 2'b10, 
reserved = 2 bie 


//initialize signals 
initial 
begin 
Ep <= 0; 
mdaness <= 42 bz: 
line <= 256'bz: 
end 


//Perform sequence of transactions 
initial 
begin 
repeat(2)@(posedge clk); 
/[BURST READ 
Pp <= ~pp; 
address <= 32'hOOOOO000; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer: 
need_bus_tngger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
wait(ABB_==low); 
wait(ABB_==hi); 
/(BURST READ 
PP <= ~pp, 


address <= 32'hO0000020; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
repeat(75)@(posedge clk); 
//BURST READ 
DP <= ~Pp; 


address <= 32'h00000180; 

Transfer_type <= burst_read; 

Transfer_code <= data_transfer; 

need_bus_tngger_ <= #4 low; 

need_bus_trigger_ <= #6 hi; 
end 


initial 
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begin 
repeat(100)@(posedge clk); 
{BURST READ 
Ppp <= ~pp; 
address <= 32'h000001 A0; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer: 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
repeat(150)@(posedge clk); 
/[BURST READ 
pp <= ~pp, 


address <= 32'h00000040: 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer: 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
repeat(200)@ (posedge clk); 
//BURST WRITE 
Pp <= ~pp; 


address <= 32'h000001 C0; 

Transfer_type <= burst_wrnite; 

Transfer_code <= data_transfer; 

line <= {64'h7777777777777777, 64 H8 88888888 88ssgsgss, 
647h1111111111111111, 64h3333333333333333}; 

need_bus_trigger_ <= #4 low; 

need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
repeat(225)@(posedge clk); 
/(BURST READ 
pp <= ~pp, 


address <= 32'h00000060; 

Transfer_type <= burst_read; 

Transfer_code <= data_transfer; 

need_bus_trigger_ <= #4 low; 

need_bus_trigger_ <= #6 hi; 
end 


initial 


begin 
repeat(250)@(posedge clk); 
/(BURST READ 
pp <= ~pp; 
address <= 32'h00000100; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_tnigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 


endmodule 


i 6 PREDICTION TEST RESULTS 


Host command: verilog 
Command arguments: 
-f verilog_arguments 
bus_interface.v 
pIc.v 
snooper.v 
controller.v 
datalist.v 
line_mgr.v 
predictor.v 
testbench.v 
arbiter.v 
cpu.v 
memory.v 
sequencer5.v 


VERILOG-XL 2.1.2 log file created Feb 2,1996 13:14:29 
VERILOG- Xie 2 Feb. 2519900 13:14:29 


Copyright (c) 1994 Cadence Design Systems, Inc. All Rights Reserved. 
Unpublished -- rights reserved under the copyright laws of the United States. 


Copyright (c) 1994 UNIX Systems Laboratories, Inc. Reproduced with Permission. 


THIS SOFTWARE AND ON-LINE DOCUMENTATION CONTAIN CONFIDENTIAL INFORMATION 
AND TRADE SECRETS OF CADENCE DESIGN SYSTEMS, INC. USE, DISCLOSURE, OR 
REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF 
CADENCE DESIGN SYSTEMS, INC. 

RESTRICTED RIGHTS LEGEND 


Use, duplication, or disclosure by the Government is subject to 
restrictions as set forth in subparagraph (c)(1)(11) of the Rights in 
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Technical Data and Computer Software clause at DFARS 252.227-7013 or 
subparagraphs (c)(1) and (2) of Commercial Computer Software -- Restricted 
Rights at 48 CFR 52.227-19, as applicable. 


Cadence Design Systems, Inc. 
555 River Oaks Parkway 
San Jose, California 95134 


For technical assistance please contact the Cadence Response Center at 
1-800-CADENC2 or send email to crc_customers@cadence.com 


For more information on Cadence's Verilog-XL product line send email to 
talkverilog@cadence.com 


Compiling source file "bus_interface.v" 
Compiling source file "prc.v" 
Compiling source file "snooper.v" 
Compiling source file "controller.v" 
Compiling source file "datalist.v" 
Compiling source file "line_mgr.v" 
Compiling source file "predictor.v" 
Compiling source file "testbench.v" 
Compiling source file "arbiter.v" 
Compiling source file "“cpu.v" 
Compiling source file '"memory.v" 
Compiling source file "sequencerS.v" 


Highest level modules: 

testbench 

Line_megr selected new ActiveLine = 0 at $d > 
Line_megr selected new ActiveLine = 1 at $d 1162 
Line_megr selected new ActiveLine = 2 at $d 2287 
Line_megr selected new ActiveLine = 3 at $d 3412 
Line_megr selected new ActiveLine = 4 at $d 4537 
Line_mgr selected new ActiveLine = 5 at $d 5662 
Line_megr selected new ActiveLine = 6 at $d 6787 
Line_megr selected new ActiveLine = 7 at $d 7912 
Line_mer selected new ActiveLine = 8 at $d 9037 
Line_megr selected new ActiveLine = 9 at $d 10162 
Line_megr selected new ActiveLine = 10 at $d 11287 
Line_mgr selected new ActiveLine = 11 at $d 12412 
Line_megr selected new ActiveLine = 12 at $d 537 
Line_megr selected new ActiveLine = 13 at $d 14662 
Line_mgr selected new ActiveLine = 14 at $d 15787 
Line_megr selected new ActiveLine = 15 at $d 16912 
Line_megr selected new ActiveLine = 16 at $d 18037 
Line_megr selected new ActiveLine = 17 at $d 19162 
Line_megr selected new ActiveLine = 18 at $d 20287 
Line_mgr selected new ActiveLine = 19 at $d 21412 
Line_mer selected new ActiveLine = 20 at $d 22537 
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Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mer selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mer selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_megr selected new ActiveLine = 
Line_mgr selected new ActiveLine = 


21 at Sd 
22 at $d 
23 at $d 
24 at $d 
25 at Sd 
26 at $d 
2/ at od 
28 at $d 
29 at $d 
30 at $d 
31 at Sd 
32 at sal 
33 at $d 
34 at Sd 
35 at $d 
36 at $d 
37 at $d 
38 at $d 
39 at $d 
40 at $d 
41 at $d 
42 at Sd 
43 at $d 
44 at $d 
45 at Sd 
46 at Sd 
47 at $d 
48 at $d 
49 at $d 
50 at $d 
51 at $d 
52 at $d 
Bo) At axe| 
54 at $d 
Spal. wel 
56 at $d 
57 at $d 
58 at $d 
59 at $d 
60 at $d 
61 at $d 
62 at $d 
63 at $d 
64 at $d 
65 at $d 
66 at $d 
67 at $d 
68 at $d 
69 at $d 
a0 at sd 


23662 
24787 
25912 
21037 
28162 
2928) 
30412 
2S, 
32662 
So cr 
34912 
36037 
Sy loys 
38287 
39412 
40537 
41662 
42787 
43912 
45037 
46162 
47287 
48412 
49537 
50662 
Sls? 
See 
54037 
Spy licy! 
56287 
57412 
Say 
59662 
60787 
Olei2 
63037 
64162 
65287 
66412 
Gis 
68662 
69787 
70912 
72037 
po 162 
74287 
75412 
76537 
fe O07 
78787 
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Line_megr selected new ActiveLine = 71 at $d 
Line_megr selected new ActiveLine = 72 at $d 
Line_mer selected new ActiveLine = 73 at $d 
Line_mer selected new ActiveLine = 74 at $d 
Line_mer selected new ActiveLine = 75 at Sd 
Line_mgr selected new ActiveLine = 76 at $d 
Line_megr selected new ActiveLine = 77 at $d 
Line_megr selected new ActiveLine = 78 at $d 
Line_mgr selected new ActiveLine = 79 at $d 
Line_megr selected new ActiveLine = 80 at $d 
Line_megr selected new ActiveLine = 81 at $d 
Line_mgr selected new ActiveLine = 82 at $d 
Line_megr selected new ActiveLine = 83 at $d 
Line_mgr selected new ActiveLine = 84 at $d 
Line_mgr selected new ActiveLine = 85 at $d 
Line_megr selected new ActiveLine = 86 at $d 
Line_mgr selected new ActiveLine = 87 at $d 
Line_megr selected new ActiveLine = 88 at $d 
Line_mgr selected new ActiveLine = 89 at $d 
Line_megr selected new ActiveLine = 90 at $d 
Line_mgr selected new ActiveLine = 91 at $d 
Line_mgr selected new ActiveLine = 92 at $d 
Line_mgr selected new ActiveLine = 93 at $d 
Line_mer selected new ActiveLine = 94 at $d 
Line_mer selected new ActiveLine = 95 at $d 
Line_mer selected new ActiveLine = 96 at $d 
Line_megr selected new ActiveLine = 97 at $d 
Line_megr selected new ActiveLine = 98 at $d 
Line_mer selected new ActiveLine = 99 at $d 
Line_mer selected new ActiveLine = 100 at $d 
Line_megr selected new ActiveLine = 101 at $d 
Line_mer selected new ActiveLine = 102 at $d 
Line_megr selected new ActiveLine = 103 at $d 
Line_megr selected new ActiveLine = 104 at $d 
Line_megr selected new ActiveLine = 105 at $d 
Line_mgr selected new ActiveLine = 106 at $d 
Line_megr selected new ActiveLine = 107 at $d 
Line_mgr selected new ActiveLine = 108 at $d 
Line_megr selected new ActiveLine = 109 at $d 
Line_mgr selected new ActiveLine = 110 at $d 
Line_mer selected new ActiveLine = 111 at $d 
Line_mer selected new ActiveLine = 112 at $d 
Line_mgr selected new ActiveLine = 113 at $d 
Line_megr selected new ActiveLine = 114 at $d 
Line_mgr selected new ActiveLine = 115 at $d 
Line_mgr selected new ActiveLine = 116 at $d 
Line_mer selected new ActiveLine = 117 at $d 
Line_mgr selected new ActiveLine = 118 at $d 
Line_mgr selected new ActiveLine = 119 at $d 
Line_mgr selected new ActiveLine = 120 at $d 


pool? 
81037 
82162 
83287 
84412 
85537 
86662 
87787 
88912 
90037 
91162 
92287 
93412 
94537 
95662 
96787 
ele 
99037 
100162 
101287 
102412 
103537 
104662 
105787 
106912 
108037 
HOOT G2 
110287 
111412 
2537 
113662 
114787 
Se es 
117037 
118162 
119287 
120412 
P21537 
122662 
123787 
124912 
126037 
127162 
128287 
129412 
130537 
131662 
UE VAE eo! 
S912 
135057 


Line_megr selected new ActiveLine = 121 at $d 136162 


Line_megr selected new ActiveLine = 122 at $d 137287 
Line_mer selected new ActiveLine = 123 at $d 138412 
Line_megr selected new ActiveLine = 124 at $d 139537 
Line_mgr selected new ActiveLine = 125 at $d 140662 
Line_megr selected new ActiveLine = 126 at $d 141787 
Line_megr selected new ActiveLine = 127 at $d 142912 
Line_mer selected new ActiveLine = 0 at $d 145162 
Line_megr selected new ActiveLine = 1 at $d 146287 
Line_mer selected new ActiveLine = 2 at $d 147412 
Line_megr selected new ActiveLine = 3 at $d 148537 


L122 "testbench.v": $finish at simulation time 152010 

31769681 simulation events + 8392 accelerated events 

CPU time: 1.0 secs to compile + 0.9 secs to link + 116.2 secs in simulation 
End of VERILOG-XL 2.1.2 Feb 2,1996 13:16:34 
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* Transaction Sequencer - Line Replacement Test 

* Filename: sequencer5.v 

* Author: Joseph R. Robert, Jr. 

* Date: VOSIANI6 

* Revised: OSJAN96 

* 

* Purpose: This is one in a set of modules which perform a sequence of CPU 
* transactions. This Sequencer causes a series of CPU operations which will 
* quickly fill the PRC. This will test the Line Replacement Unit's behavior 

* when it needs to start replacing previously used lines. 


* 


* Sequence #5: 

*K 

* for i=0 to 132, 

* burst_read 100h - PRC should switch to new line i. 

* burst_read 120h - PRC should predict i4Oh, and store data in line 1. 
*THEXED 

* 

* When using this sequencer, set all trace flags to FALSE, except for the Line 
* Manager. and run the simulation for 152000 steps. 

* 

* General Timing instructions for all Sequencers: 

* Use an initial block for each transaction. You must ensure that the 

* following rules are adhered to: 

* 1. Before the first transaction, use 

‘3 repeat(2)@(posedge clk) 

* 2. Before the first line of the second transaction, use 

: wait(ABB_==low); 
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wait(ABB_==hi); 

3. There can be only two transactions pipelined at a time. You must ensure 
manually that the first operation is complete before the third begins. 
When scheduling the current transaction, look at the transaction before 
last. Wait for that TA_ to finish. Also, wait for the ABB_ from the 
previous transaction to go high. 

4. A burst read takes 330 simulation time units = 22 clock cycles. 
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module sequencer(Transfer_size.clk,pp,address,data,line, Transfer_type. 
Transfer_code,need_bus_trigger_,ABB_ ); 


input clk, ABB_; 

output pp,need_bus_trigger_; 
output [0:31] address; 

output [0:63] data; 

output [0:255] line; 

output [0:4] Transfer_type; 
output [0:2] Transfer_size; 
output [0:1] Transfer_code; 
reg pp,need_bus_trigger_; 
reg [0:31] address; 

reg [0:63] data; 

reg [0:255] line; 

reg [0:4] Transfer_type; 
reg [0:2] Transfer_size; 
reg [0:1] Transfer_code; 


//declare variables, constants, parameters 
parameter TRUE = 1'bl, 


FALSE = 1'b0, 
hie bl. 
law = i'b0; 


parameter //for Transfer_type 


none = ¥'bz, 

write = 5'b00010, //02 
write_atomic = S'b10010, //12 
read = 5'b01010, /JOA 


read_atomic = 5'b11010, //IA 
burst_write = 5'b00110, //06 
burst_read =5'b0O1110, //OE 
burst_read_atomic = 5'b11110; //IE 
parameter //for Transfer_code 
data_transfer = 2'b00, 


touch_load =—7 Dole 
instruction_fetch = 2'b10, 
reserved =Zhbil: 


//Other internal control signals 


lige Mi 


reg [0:7] 1; // counter 


//imitialize signals 
initial 
begin 
pp <= 0; 
address <= 32'bz; 
ne <=2 56 bz: 
end 


//Perform sequence of transactions 
initial 
begin 
repeat(2)@(posedge clk); 
//BURST READ 
Ppp <= ~pp; 
address <= 32'hOO0O000000; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_tngger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
initial 
begin 
wait(ABB_==low); 
wait(ABB_==hi); 
/[(BURST READ 
Ppp <= ~pp; 


address <= 32'hO0000020; 

Transfer_type <= burst_read; 

Transfer_code <= data_transfer; 

need_bus_tngger_ <= #4 low; 

need_bus_trigger_ <= #6 hi; 
end 


initial 
begin 
repeat(25)@(posedge cik); 
fOn (i=l Sia—13221—14-) 
begin 


repeat(S50)@(posedge clk); 
//BURST READ 

Pp <= ~pp; 

address <= {12'b0, i, 12'b0}; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
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repeat(25)@(posedge clk); 
/(BURST READ 

Pp <= ~pp; 

address <= {12'b0, i, 12'hO20}; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 


end 
end 


endmodule 
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Host command: verilog 
Command arguments: 
-f verilog_arguments 
bus_interface.v 
pIc.v 
snooper.v 
controller.v 
datalist.v 
line_mer.v 
predictor.v 
testbench.v 
arbiter.v 
Cpu.v 
memory.v 
sequencer4.v 
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Compiling source file “bus_interface.v" 
Compiling source file “pre.v" 
Compiling source file 'snooper.v" 
Compiling source file “controller.v" 
Compiling source file “datalist.v" 
Compiling source file "line_mgr.v" 
Compiling source file "predictor.v" 
Compiling source file "testbench.v" 
Compiling source file "arbiter.v" . 
Compiling source file “cpu.v" 
Compilmg source file "memory.v" 
Compiling source file “sequencer4.v" 
Highest level modules: 


testbench 
Line_megr selected new ActiveLine = 0 at $d 5 
CPU started read from address OOOO00000 at time 45. 
CPU read: 0001020304050607 at 181 
CPU read: O8090a0b0cOd0e0F at 241 
CPU read: 1011121314151617 at 301 
CPU read: 1819lalblicldlelf at 361 
CPU started read from address 00000020 at time 390. 
BIU started read from address 00000040 at time As 
CPU read: 2021222374252627 at 496 
CPU read: 28292a2b2c2d2e2f at 556 
CPU read: 303132333435363/7 at 616 
CPU read: 3839343b3c3d3e3f at 676 
BIU: 4041424344454647 at 812 
BIU: 48494a4b4c4d4e4f at 872 
BIU: 5051525354555657 at 932 
BIU: 58595a5b5c5dSeSf at 992 
DATALIST uploaded this data into line 00 at time 1008. 
40414243444546474849444b4c4d4e4f505152535455565758595a5b5c5d5e5f 
CPU started read from address 00000180 at time 1140. 
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Line_mgr selected new ActiveLine = 1 at $d 1162 


CPU read: 0001020304050607 at 1276 
CPU read: 08090a0b0cOd0e0f at 1336 
CPU read: 1011121314151617 at 1396 
CPU read: 1819lalbicldlelf at 1456 
CPU started read from address 000001a0 at time SS. 
CPU reads 2021222324252627 at 1651 
BIU started read from address 000001c0 at time lesa: 
CPU read: 28292a2b2c2d2e2f at AL 
CPU read: 3031323334353637 at liege 
CPU read: 38393a3b3c3d3e3f at 1831 
BIU: 4041424344454647 at 1967 
BIU: 48494a4b4c4d4e4f at 2027 
BIU: 5051525354555657 at 2087 
BIU: 58595a5b5c5d5e5f at PAY 
DATALIST uploaded this data into line 01 at time INK eRY 
404 14243444546474849444b4c4d4e4f505 152535455565758595a5b5c5d5e5f 
CPU started read from address 00000040 at time 2265: 
Line_megr selected new ActiveLine = Q at $d 2287 
DATALIST downloaded this data from line 00 at time Belay 
404 142434445464748494a4b4c4d4e4f505 152535455565758595a5b5c5d5e5f 
CPU read: 404 1424344454647 at 2356 
CPU read: 48494a4b4c4d4e4f at 2371 
CPU read: 5051525354555657 at 2386 
CPU read: 58595a5b5c5d5e5f at 2401 
BIU started read from address 00000060 at time 2482. 
BIU: 6061626364656667 at 2627 
BIU: 68696a6b6c6d6e6f at 2687 
il; 707172/3/4757677 at 2747 
BIU: 78797a7b7c7d7e7f at 2807 
DATALIST uploaded this data into line 00 at time 2823. 
606 1626364656667 68696a6b6c6d6e6f707 172737475767778797aTbic7d7e7f 
CPU started write to address 0O00001c0 at time 3007. 
CPUwiie beat l: 7777777777777777 at 3022 
Line_mgr selected new ActiveLine = 1 at $d 3037 
Line manager flushed line 1 at time 3048. 
CPU write beat 2: 8888888888888888 at 3158 
CEU mie beat 3: 1111111111111111 at 3218 
CPU write beat 4: 3333333333333333 at 3278 
CPU started read from address 00000060 at time 3390. 
Line_megr selected new ActiveLine = 0 at $d 3412 
DATALIST downloaded this data from line 00 at time 3438. 
606162636465666768696a6b6c6d6e6f707 172737475767778797alb7c7d7e7f 
CPU read: 6061626364656667 at 3481 
CPU read: 68696a6b6c6d6e6f at 3496 
CPU read: 7071727374757677 at 3511 
CPU read: 78797a7b/c7d7e7f at Sp26 
BIU started read from address 00000080 at time 3607. 
BIU: 0001020304050607 at B72 
BIU: O8090a0b0cOd0e0F at 3812 
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BIU: 1011121314151617 at e372 


BIU: 1819lalblcldlelf at 3032 
CPU started read from address 00000100 at time 3945. 
DATALIST uploaded this data into line 00 at time 3948. 
000102030405060708090a0b0cOd0e0f 10111213141516171819 lalblcldlelf 
Line_megr selected new ActiveLine = 2 at $d 3982 
CPU read: 0001020304050607 at 4066 
CPU read: 08090a0b0cOd0e0F at 4126 
CPU read: 1011121314151617 at 4186 
CPU read: 18191lalbicldlelf at 4246 


L123 "“testbench.v": $finish at simulation time 6010 

1661039 simulation events + 265 accelerated events 

CPU time: 0.8 secs to compile + 0.8 secs to link + 5.0 secs in simulation 
End of VERILOG-XL 2.1.2 Feb 2, 1996 13:22:29 
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APPENDIX D. PRC STRUCTURE FILES 


This appendix contains the Verilog files for the final 
hardware design. They include the Verilog structural models 
of the PRC and the testing results. The files are located on 
the ECE system at home5/robert/thesis/epoch/verilog. 


A. PRC 
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* Predictive Read Cache 
* Filename: prc.v 
* Author: Joseph R. Robert, Jr. 
* Date: O2ZOCT9S5 
* Revised: 14MAR96 
Purpose: This module emulates the predictive read cache, connecting all the parts. 
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module prc(HRESET_,clk,BG_,DBG_,BR_,CANX,D,A,DP,TT, AP, TSIZ,TC,ABB_,AACK_,TS_, 
TBST_,DBB_,TA_,DPE_); 


// epoch set_attribute FIXEDBLOCK = 1 


input HRESET_,clk,BG_,DBG_; 

output BR_,CANX; 

inout [63:0] D; 

inout [31:0] A; 

inout [7:0] DP; 

inout [4:0] TT: 

inout [3:0] AP; 

inout [2:0] TSIZ; 

inout [1:0] TC; 

inout ABB_,AACK_.TS_,TBST_,DBB_,TA_,DPE_; 


wire [255:0] DATALINE; 

wire [26:0] CAR,NAR,MRMA; 

wire [6:0] ActiveLine; 

wire [1:0] BURSTSTART; 

wire fetch_done,fetch_abort,send_done,read.write.hit,line_empty, 
snoop_ignore,upload,download,BR_,CANX; 


Low 


ti (63-0), D:; 

Tis 1:0) A: 

tri i7;0)) DF; 

ir Oia. 

tri [3:0] AP; 

m0) siz. 

trio eae. 

tri ABB_,AACK_,TS_,TBST_,DBB_,TA_,DPE ; 


//Connect parts which have been converted to hardware. 


// epoch pre_compiled predictor 
predictor PRE1(MRMA,CAR[25:0], predict. NAR,HRESET_); 


// epoch pre_compiled line_megr 
line_mgr LM1(CAR,NAR,HRESET_.,a_select,test,fetch_done,flush,store, 
new_replace,MRMA,ActiveLine,line_empty,hit,clk); 


/{ epoch pre_compiled datalist 
datalist DL1(DATALINE, ActiveLine,upload,download); 


// epoch pre_compiled snooper 
snooper SN1(A,AP,TT.TC,TS_,snoop_ignore,hold,clk,CAR,BURSTSTART, 
read write, HRESET_); 


// epoch pre_compiled bus_interface 

bus_interface BIU1(NAR,BURSTSTART,BG_,AACK_,DBG_.,send.,fetch, 
clk, BR_,upload,download,fetch_done,fetch_abort, 
send_done,CANX,snoop_ignore, DATALINE.D,A,AP,DP,DPE_, 
TI. TSIZ,TC.ABB_,TS_,TBST_,DBB_,TA_,HRESET_); 


// epoch pre_compiled controller 

controller CON1(HRESET_.read,wnite,hit,send_done,fetch_done,fetch_abort, 
line_empty.a_select,test,predict,store, 
flush,send,hold,new_replace,fetch,clk); 


endmodule 


B. CONTROLLER 
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* CONTROLLER 

* Filename: controller.v 

* Author: JOSepi kK. RODEN, Jr. 

* Date: 21DEC95 


IEG) 


* Revised: 20MAR96 

Purpose: This module 1s a Finite State Machine which coordinates the actions of all the other functional 
blocks of the PRC. All control signals are synchronous with the system clock. HRESET_ causes the Controller 
to goto the IDLE state. The state diagram and state output tables give more details. 

Of significance are the wait states added to the state diagram of the behavioral model. These changes are 
highlighted in the State Output Table. The changes were required by the Line Manager, in which there is a 
significant propagation delay for the addresses. This is described in more detail in the Line Manager section of this 
chapter. This is a prime candidate for future work to improve the PRC's design. 

*k 
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module controller (HRESET_,read,write,hit,send_done,fetch_done,fetch_abort, 
line_empty,a_select,test,predict,store, 
flush,send,hold.new_replace,fetch,clk); 


// epoch set_attibute FIXEDBLOCK = 1 


input HRESET_,read,write,hit,send_done,fetch_done,fetch_abort,line_empty,clk; 
output a_select,test,predict,store,flush,send,hold,new_replace,fetch; 


reg a_select,test,predict,store,flush,send,hold.new_replace.fetch; 


//Finite State Machine 


parameter // epoch enum stat 
idle = 00h 
fesiecar © = 5dl, 
send_data =5'd2, 
fesiemar «= 5'd3, 
fetch_data = 5'd4, 
is_line_empty = 5‘d5, 
predictna =5'd6, 
Store car = di, 
fest_car_ ww = 5) de) 
flush_line = S5'd9, 


wait_a = 5010: 
wait_b =o ale 
walt_c = pyle 
wait_d =o dl3, 
wait_e = J dl4. 
wait_f = 515, 
walt_g = S010, 
wait_h oud SW 
wait_l = dis, 
demstate = = 5 DX: 


reg [4:0] /* epoch enum stat */ state, next_state; 
reg a_select.fetch,flush,hold,new_replace,predict,send,store.test; 


Ia 


always @(posedge clk or negedge HRESET_) 


begin 
if CHRESET_) 
State = idle; 
else 
Slate = next_State; 
end 


always @(state or read or write or hit or send_done or line_empty or 
fetch_done or fetch_abort) 


begin 
//default values 
a_select = 1'b0; //CAR 
fetch =e: 
flush = bo: 
hold = Pb: 


new_replace = 1'b0; 
predict = 1'b0; 


send = [pe 
store = jh ioioe 
test =p: 


case (state) 


idle: //O 
begin 
if (read == 1'bO & write == I'b0) next_state = idle; 
else if (read == l'bO & write == 1'bl) next_state = wait_d; 
else if (read == 1 bl imextistate = waited 
else next_state = dc_state; 
end 


wait_a: //10 
begin 
hold =a Dae 
next_state = wait_b; 
end 


wait_b: //11 
begin 
hold =lbe 
next_state = walt_c; 
end 


waite) 12 
begin 
hold line 
next_State = test_Car_yr; 
end 
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aed) // 13 


begin 
hold =) love 
next_state = wait_e; 
end 
wait_e: //14 
begin 
hold — Ny on 
next_state = wait_f; 
end 
wait_f: //15 
begin 
hold — i lcyle 
next_State = test_Car_w; 
end 


test_car_r: //1 


begin 
test = l[|o) = 
hold — bl 
if (hit) 


next_state = send_data; 
else next_state = is_line_empty; 


end 


send_data: //2 
begin 
a_select = I'bl; //NAR 
predict = l'bl; 
send = 1|'bl; 
hold = iil: 
if (send_done) 
next_state = test_nar; 

else next_state = send_data; 


end 
test_nar: //3 
begin 
a_select =1'bl; //[NAR 
test =) lols 
hold — Joy be 


if ({hit,read,write } == 3’b000) next_state = fetch_data; 
else next_state = idle; 
end 


fetch_data: //4 
begin 
a select =I1'bl; //NAR 


hold =a) bie 
fetch — hell: 
if ({fetch_done,fetch_abort } == 2’b00) next_state = fetch_data; 
else next_state = idle; 
end 


is_line_empty: //5 
begin 
hold = bu: 
if (line_empty) 
next_state = store_car; 
else next_state = predict_na; 
end 


predict_na: //6 
begin 
acselect) 3— )bls 
predict = l'bl; 
hold = Isp 
new_replace = I'b1; 
next_state = walt_g; 


end 
wait_g: //16 
begin 
a_select = 1'bl; /[NAR 
hold = bie 
next_state = wait_h; 
end 
wait_h: //17 
begin 
a_select =1'bl: /NAR 
hold = jl Jel 
next_state = wait_i; 
end 
wait_i: //18 
begin 
a_select =l1'bl; /NNAR 
hold =} oul? 
next_state = test_nar: 
end 


store_car: //7 


begin 
store able 
hold lub le 
next_state = idle; 
end 
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test_car_w: //8 


begin 
test = 1, [oe 
hold =slalad: 
if (hit) 


next_state = flush_line; 
else next_state = idle; 
end 


flush_line: //9 
begin 
flush = lable 
hold = lébil: 
next_state = idle; 
end 


default: 
begin 
Hext= state = dc_state: 
end 


endcase 
end 


endmodule 


C. SNOOPER 
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* SNOOPER 

* Filename: snooper.v 

* Author: Joseph R. Robert, Jr. 
FDate:, 21DEC95 

* Revised: OOMARI6 

* 

Purpose: This module watches the system bus activity, and makes appropriate reports to the PRC 
Controller. 

If the transaction is a data burst read or any kind of write, and if the address parity is correct, then the read 
Or write signal is asserted as appropriate, and the address is placed in the CAR. The snoop_ignore signal tells this 
unit to ignore the current transaction, because it was initiated by the Bus Interface Unit. The snoop_ignore signal 
must be asserted concurrently with the transfer attributes. Reads that are not burst or data related are ignored by 
the PRC. The CAR 1s updated only on transactions relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC, with respect to memory accesses, a second 
address tenure can occur shortly after the first, well before the first data tenure is complete. To compensate for this, 
the read and wnite outputs of the Snooper will remain exerted until acknowledged by the Controller with hold. The 
rising edge of hold indicates that the read or write signal was received by the Controller. The Snooper can then 


negate these signals, but must leave CAR alone until hold is negated. After hold is negated, CAR can be updated 
to the new address. 

In Stage 0, the transfer attributes are latched in registers. Combinational logic determines if these tranfer 
attributes represent a valid read or a valid write, and if the parity address parity is correct. If the transaction is valid, 
and one that the PRC is interested in, then Stage O raises a transaction_waiting signal. 

A Finite State Machine in Stage One sits in the IDLE state until it receives that signal. Then it latches 
the signals needed from Stage 0, resets the transaction_waiting signal, and then waits for the hold signal to go low. 
A high hold signal indicates that the PRC is not done with the previous transaction. Once hold goes low, the read 
and write flags are set according to the type of the current transaction. Also, the input address is stored in the 
Current Address Register. The FSM then waits for the rising edge of hold before returning to the IDLE state where 
it can check if there is another transaction waiting. 
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module snooper (A,AP,TT,TC,TS_,snoop_ignore,hold,clk, CAR,BURSTSTART, 
read_flag,wnte_flag, HRESET_); 


// epoch set_attnbute FIXEDBLOCK = 1 


input [31:0] A; 

input [3:0] AP; 

input [4:0] TT; 

input [1:0] TC; 

input TS_,snoop_ignore,hold.clk, HRESET_; 
output [26:0] CAR; 

output [1:0] BURSTSTART; 

output read_flag,write_flag; 


wire [31:0] addressO; 

wire (28:0) address 1; 

wire [26:0] CAR; 

wire [4:0] TransferType; 

wire [3:0] addr_parity; 

wire [1:0] BURSTSTART, TransferCode; 

wire car_latch,flag_reset_,hold_,ignore,latchO,latch1,parity_error, 
read_flag,read_set_,TS,transaction_waiting,tw_set,tw_reset_, 
valid_op,valid_readO,valid_readi,valid_write0,valid_write1, 
w 1,w2,w3,w5,w6,w7,write_flag_,write_set_,prelatchO; 


I/ISTAGE 0 


//Stage O latches 

stdinv TS_INV (TS_,TS); 

stddff TS_Latch (.CLK(clk),.D(TS),.Q(prelatchO)); 

stdbuf LatchOBuffer (.INO(prelatchO),. Y(latchO)); 

dff #(32,0,"AUTO","1") AddressLatchO (.CLK(latch0),.D(A),.Q(address0)): 

dff #( 4,0,"AUTO","1") AddrParityLatch (.CLK(latchO),.D(AP),.Q(addr_parity)); 

dff #( 5.0." AUTO"","1") TransferTypeLatch (.CLK(latchO),.D(TT),.Q(TransferType)); 
dff #( 2,0,,AUTO","1") TransferCodeLatch (.CLK(latch0)..D(TC),.Q(TransferCode)); 
stddff lgnoreLatch (.CLK(latchO),.D(snoop_ignore),.Q(ignore)); 
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//Odd parity checker 
parityo_chk32 
OddParityChecker (.D(addressO),.PIN(addr_parity),. ERROR(parity_error)); 


//Read checker 

stdnor2 NOR_C (TransferCode[1],TransferCode[0],w 1): 

stdinv INV_F (TransferType[0],TransferTypeO_); 

stdand4 AND_D (TransferType[3],TransferType[2],TransferType[ 1], TransferType0_, 
w2); 

stdand2 AND_E (wl,w2,valid_read0); 


/{Write checker 

stdinv INV_J (TransferType[3],TransferType3_); 

stdnand2 NAND_H (TransferType[4], TransferType[2],w3); 

stdand4 AND_G (TransferType3_,TransferType[1],TransferT ypeO_,w3,valid_write0); 


/fTransaction checker 
stdnor2 NOR_L (valid_wniteO,valid_read0,w5); 
stdnor3 NOR_M (parity_error,w5,ignore,valid_transaction); 


/fYransaction Waiting Latch 

stdand2 TW_SetAND (latchO,valid_transaction,tw_set); 

stdand2 TW_ResetAND (tw_resetl_,HRESET_,tw_reset_); 

stdlatch_c TW_Latch 
(.D(tw_set),.CLR(tw_reset_),.EN(latch0O),.Q(transaction_waiting)); 


//ISTAGE 1 


//Stage 1 latches 
dff #(29,0,"AUTO","'1") 
AddressLatch1 (.CLK(latch1),.D(addressO[31:3]),.Q(address1)); 
stddff ValidReadLatch] (.CLK(atch1),.D(valid_readO),.Q(valid_read1)); 
stddff Valid WriteLatch1 (.CLK (latch 1),.D(valid_write0),.Q(valid_wnitel)); 


/fread and write flags 
stdinv HOLD_INV (hold,hold_); 
stdand2 FLAG_RESET_AND (.INO(hold_),.IN1(HRESET_).,.Y(flag_reset_)); 
stddff_c ReadFlagLatch 
(.CLK(flag_clk),.CLR(flag_reset_),.D(valid_read1 ),.Q(read_flag )); 
stddff_c WnteFlagLatch 
(.CLK(flag_clk),.CLR(flag_reset_),.D(valid_write1),.Q(wnte_flag)); 


//Current Address Register 


dff #(29,0,"AUTO","1") 
CA_Register (.CLK(car_latch),.D(address1),.Q({CAR,BURSTSTART })); 


/[FINITE STATE MACHINE 


parameter // epoch enum stat 
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IDLE = 3'd0, 

LATCH = dl, 

OUTPUTS = 5. 
WAIT_FOR_HOLD = 3'd3, 
WAIT_FOR_NOT_HOLD = 3'd4, 
dc State = = 5 xx: 


reg [2:0] /* epoch enum stat */ state, next_state; 
reg latchl,tw_reset1_,flag_clk,car_latch; 


always @(posedge clk or negedge HRESET_) 
begin 
i CHRESE ba) 
State = IDLE: 
else 
State = next_State; 
end 


always @(state or transaction_waiting or hold) 
begin 


//default values 
eivda = Nebr 
tw_resetl_ = l’bl; 
flag_clk = 1'b0; 
Car latche= 1 bO: 


case (state) 


IDLE: begin 
if (transaction_waiting) 
next_state = LATCH; 
else next_state = IDLE; 
end 


LATCH: begin 
latch? “= bi: 
tw_resetl_ = 1'bO; 
if (hold) 
next_state = WAIT_FOR_NOT_HOLD; 
else next_state = OUTPUTS; 
end 


WAIT_FOR_NOT_HOLD: begin 
if (hold) 
next_state = WAIT_FOR_NOT_HOLD; 
else next_state = OUTPUTS; 
end 


OUTPUTS: begin 


flag_cik = 1'bl; 

Capac — | bis 

next_state = WAIT_FOR_HOLD; 
end 


WAIT_FOR_HOLD: begin 
if (hold) 
next_state = IDLE; 
else next_state = WAIT_FOR_HOLD; 
end 


default: begin 
mextestate = dc_ state: 
end 


endcase 
end 


endmodule 


d ge Thirty-Two-Input, Odd-Parity Checker 
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* ODD PARITY CHECKER 

* Filename: parityo_chk32.v 

* Author: Joseph R. Robert, Jr. 

* Date: 12FEB96 

* Revised: 12FEB96 

*K 

Purpose: This module checks the parity of the input data, comparing it to the input parity. Parity is odd including 
the parity bit. 
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module parityo_chk32 (D,PIN,ERROR); 


input [31:0] D; 
input [3:0] PIN; 
output ERROR; 


wire ERROR_O,ERROR_1,ERROR_2,ERROR_3,ERROR; 


parityco #(8,0,"AUTO","1") 

parity_group_0 (.D(D[ 7: 0]),.PIN(PIN[0]),.ERROR(ERROR_O)); 
parityco #(8,0," AUTO","1") 

parity_group_] (.D(D[15: 8]),-PIN(PIN[1]),.ERROR(ERROR_1)); 
parityco #(8,0,"AUTO"","1") 
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parity_group_2 (.D(D[23:16]),.PIN(PIN[2]), ERROR(ERROR_2)); 
parityco #(8,0," AUTO", 1") 
parity_group_3 (.D(D[31:24]),.PIN(PIN[3]),.ERROR(ERROR_3)):; 


stdor4 OR_A (ERROR_0O,ERROR_1,ERROR_2,ERROR_3,ERROR); 


endmodule 


D. LINE MANAGER 
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LINE MANAGER 
Filename: line_mgr.v 
Author: Joseph R. Robert, Jr. 
Date: 21 DEG?5 

Revised: 20MAR9696 


Purpose: The function of this module is completely described in the behavioral model. 

This structural model uses a high speed RAM (hsram) for the MRMA List. The CAR is stored into this 
RAM on a Store or fetch_done signal. 

The predicted_ma_list is a register file for storing predicted memory addresses. This list is composed 
of 128 address registers, 128 equality comparators, and 128 Valid status flags. The NAR is stored in this list at 
the fetch_done pulse. If there is a match with the input address (in_addr), a pnonty encoder (ENC_C) determines 
which line matches. 

The line replacement unit determines the next line to be replaced whenever the PRC needs to start a new 
line. It first selects invalid lines. If all the lines are valid, then it selects lines that have been "aged". A priority 
encoder (ENC_1) choses the line with the lowest index among all the lines that can be replaced. If all lines are 
valid, the encoder's output enable (oe) signal is used to cause aging. 

Aging is accomplished by the use of a 7-bit counter (ager_counter), initially set to zero. When the 
cause_aging signal from the encoder is high, the counter advances. A decoder (DEC_B) output causes the 
appropriate Aged flag to be set. 

Changing values of the CAR or NAR have a propogation delay of 25 ns (1.8 cycles) through the input 
address multiplexer (in_addr mux). This required the addition of wait states in the Controller before each of the | 
tests. 

The Revised Controller State Diagram and Revised Controller State Output Table show the required changes. 
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module line_mgr (CAR,NAR,HRESET_.,a_select,test,fetch_done,flush,store, 
new_replace,MRMA_out,ActiveLine,line_empty,hit.clk); 
// epoch set_attribute FIXEDBLOCK = 1 


input [26:0] CAR,NAR; 

input HRESET_.,a_select,test,fetch_done,flush,store,new_replace,clk; 
output [26:0] MRMA_ out; 

output [6:0] ActiveLine; 

output line_empty.hit; 


wire [127:0] Valid; 

wire [26:0] in_addr,in_addr_buf, MRMA_ out; 

wire [6:0] ActiveLine,ReplaceLine,match_line,w1; 

wire MRMA_write,match,all_lines_valid,a_select_,w?2,hit, 
hreset,store_,line_empty,le_set_; 


//Address multiplexer 
mux2 #(27,0,"AUTO"","1") 
addrmux (.INO(CAR),.IN1(NAR),.SO(a_select),. Y(in_addr)); 
buff #(27,0," AUTO","20") InAddrBuffer (.INOQ(in_addr),. Y(in_addr_buf)); 


/[MRMA_list 
stdnor2 MRMA_NOR (.INO(store),.IN1(fetch_done),.Y(MRMA_write_)): 
Menand #(2/,128,7,32,1,"2") 

MRMA_list (.A(ActiveLine),.DIN(CAR),.WR(MRMA_write_),.DOUT(MRMA_out)): 


/fPredMA_list 
// epoch pre_compiled predicted_ma_list 
predicted_ma_list 
PredMA_list] (NAR,in_addr_buf,ActiveLine,fetch_done.flush,HRESET_, 
Valid,match_line,match); 
and128 all_valid_ands ( Valid,all_lines_valid); 


/fLine Replacement Unit 

// epoch pre_compiled line_replacement_unit 

line_replacement_unit LRU1(Valid,ActiveLine,all_lines_valid, 
new_replace,fetch_done,HRESET_,clk,ReplaceLine); 


//ActiveLine pointer 
stdbufinv a_select_inv(.INO(a_select),. Y(a_select_)); 
stdand2 AL_AND (.INO(test),.IN1(a_select_),.¥(w2)); 
mux2 #(7,0,"AUTO","1") 
al_mux (.INO(ReplaceLine),.IN1(match_line),.SO(match),.Y(w1)); 
dff_c #(7,0,"AUTO"","1") 
ActiveLineReg (.CLK(w2),.CLR(HRESET_),.D(w1),.Q(ActiveLine)); 


/fHit status flag 
stdlatch hit_latch(.D(match),.EN(test),.Q(hit)); 


//\me_empty status flag 

stdbufinv HRESET_inv(.INO(HRESET_),.Y (hreset)); 

stdbufinv store_inv(.INO(store),. Y(store_)); 

stdnor2 LE_NOR (.DNO(hreset),.IN 1(new_replace),. Y (le_set_)); 
srlatch line_empty_latch(.S_(le_set_),.R_(store_),.Q(line_empty)); 


endmodule 
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se Address Register With Equal Comparator 


[RC RR RARE ER RK RRR KEE AE EAR AA EAE A ee a 


ADDRESS REGISTER WITH EQUALITY COMPARITOR for PredMA storage 
Filename: addre.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 


Purpose: This structural model is a building block for the Predicted Memory Address List (PredMA_List). It 
consists of a single 27-bit register and an equality comparator. The output of the register is compared with the 
input address (in_addr). 
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module addre (NAR,in_addr,store_enable,eq, HRESET_): 
// epoch set_attnbute FXXEDBLOCK = 1 


input [26:0] NAR,in_addr; 
input store_enable, HRESET_; 
output eq; 


wire [26:0] wl; 
wire eq; 


dff_c #(27,0,"AUTO","1") PredMA_reg (.CLK(store_enable),.,CLR(HRESET_), 
-D(NAR),.Q(w1)); 
equal #(27,0,"AUTO","1") equall (.A(w1),.B(in_addr),. Y(eq)); 


endmodule 


26 AND Gate With 128 Inputs and One Output 
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128-INPUT AND GATE 
Filename: and128.v 

Author: Joseph R. Robert, Jr. 
Date: 21DEC95 

Revised: 2ZOMAR96 


Purpose: This structural model is a 128-input AND gate. 
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module and128 (in,out): 


input [127:0] in; 


te 


output out; 
wire out,out_unbuffered; 


wire [31:0] A; 
wire [7:0] B; 
wire [1:0] C; 


and4 #(32.0," AUTO","1") AND_A (.INO(Gn[127:96]),.IN 1 (in[95:64]), 
.IN2(in[63:32]),.IN3(in[3 1:0]),. Y(A)); 

and4 #( 8,0,"AUTO","1") AND_B (C.INO( A[31:24]),.IN1¢( A[23:16]), 
.IN2( A[15:8]),-IN3( A[7:0]),. Y(B)); 

and4 #( 2,0," AUTO","1") AND_C (.INO( B[7:6]),.IN1( B[5:4]), 
.IN2( B[3:2]),.IN3( B[1:0]),.Y(C)); 

stdand2 AND_D (.INO(C[0)),.IN1(C[1]),. Y(out_unbuffered)); 

stdbuf #("15") OutputBuffer (.INO(out_unbuffered),. Y(out)); 


endmodule 


ce Codefile for Seven-to-128 
(dec7tol28e.codefile) 


//PLA TABLE for 7 to 128 decoder with enable 
//inO in] in2 in3 in4 in5 in6 EN 


00000001 // line O 
0000001 1 // line 
00000101 // line 
00000111 // line 
00001001 // line 
00001011 // line 
00001101 // line 
00001111 // line 
00010001 // line 
00010011 // line 
00010101 // line 
00010111 // line 
00011001 // line 
00011011 // line 
00011101 // line 
00011111 // line 
00100001 // line 
00100011 // line 
00100101 // line 
00100111 // line 
00101001 // line 
00101011 // \ine 
00101101 // ine 
00101111 // line 


ow 


Decoder 


00110001 
00110011 
00110101 
OO110111 
00111001 
00111011 
00111101 
OORT 
01000001 
01000011 
01000101 
01000111 
01001001 
01001011 
01001101 
01001111 
01010001 
01010011 
01010101 
01010111 
01011001 
01011011 
01011101 
01011111 
01100001 
01100011 
01100101 
O1L00111 
01101001 
01101011 
01101101 
01101111 
01110001 
01110011 
01110101 
01110111 
01111001 
01111011 
01111101 
01111111 
1 0000001 
1000001 1 
10000101 
10000111 
10001001 
10001011 
10001101 
1OO001111 
10010001 
1001001 1 


// line 
// \ine 
// line 
// line 
// line 
// line 
// line 
// line 
// line 32 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// \ine 
/f \ine 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// \ine 
// line 
// line 
// line 
// line 
// line 
// line 
//\ine 
// \ine 
// line 64 
// line 
// line 
// line 
// line 
// line 
// \ine 
// line 
// line 
// \ine 


lis, 


10010101 
10010111 
10011001 
10011011 
10011101 
10011111 
10100001 
10100011 
10100101 
10100111 
10101001 
10101011 
10101101 
10101111 
10110001 
10110011 
10110101 
10110111 
10111001 
-10111011 
10111101 
Ie 11 
11000001 
11000011 
11000101 
11000111 
11001001 
11001011 
11001101 
11001111 
11010001 
11010011 
11010101 
11010111 
11011001 
11011011 
11011101 
POEL T 
11100001 
11100011 
11100101 
11100111 
11101001 
11101011 
11101101 
LOT) 1 
11110001 
GEO 1 1 
11110101 
ILL YOu 


// line 
//\ine 
// line 
// line 
//\ine 
// line 
// line 
// line 
//\ine 
// line 
//\ine 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
//\ine 
// line 
// line 
// line 
// line 
//\ine 
// line 
// line 
// \ine 
//\ine 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// line 
// \ine 
//\ine 
//\ine 
// line 
// line 
// line 
// line 
// line 
//\ine 
// line 
// line 


se 


roo // \ine 


PGE // \ine 
DO // \ine 
SSSseen //\ine 128 
//END TABLE 


ake One-Hundred-and-Twenty-Eight-Input, Seven-Output 
Encoder, Priority to Low Bits 
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128 TO 7 ENCODER, PRIORITY LOW 
Filename: enc128to7lo.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 


Purpose: This structural model is a 128-bit input, 7-bit output priority encoder. The highest priority is given to 
the bit with the lowest index. Inputs and outputs are active high. It is composed of four 32 to 5 priority encoders 


and the logic gates necessary to connect them together. 


module enc128to7lo (I,A.ei,e0, gs): 
// epoch set_attribute FIXEDBLOCK = 1 


mput (127-0) 
input e1; 
output [6:0] A; 
Output gs,eo; 


wire [4:0] g0A,g1A,g2A,23A; 
wire g3e0,g2e0,g1e0,g3gs,g2g5,21 2s, 20gs,e0, gs; 


enc32toS5lo ENCg3 (1[127:96],g3A,g2e0, eo,g3gs); 
enc32toSlo ENC g2 (I 95:64],g2A,¢ leo,g2e0, g2gs); 
enc32toSlo ENCg1 (J[ 63:32],g1A,g0eo,g1eo,g1 gs); 
enc32to5lo ENCg0 ([ 31: O],g0A, ei,g0eo.g0gs); 


//Group Select 
stdor4 OR_A (g3gs,¢2gs,¢1¢s,g¢0gs,¢s); 


HAG 
stdor2 OR_B (g3gs,92¢s,A[6]); 


HAS 
stdor2 OR_C (g3¢s,¢]1gs,A[5]); 
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//A4 - AO 

stdor4 OR_D (g0A[4],g1A[4],g2A[4],23A[4],A[4]); 
stdor4 OR_E (g0A(3],g1A(3],g2A([3],g3A(3],A[3]): 
stdor4 OR_F (g0A[2],g1 A[2],g2A[2],g3A[2],A[2]); 
stdor4 OR_G (gOA[1],g1 A[1],g2A[1],g3A[1],A[1]); 
stdor4¢ OR_H (gOA[0],g1 A[0],g2 A[0],g3A[0],A[0]); 


endmodule 


Sy: Thirty-Two-Input, Five-Output Encoder, Priority to 
Low Bits 
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32 TO 5 ENCODER, PRIORITY LOW 
Filename: enc32to5lo.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 


Purpose: This structural model is a 32-bit input, 5-bit output priority encoder. The highest priority is given to the 
bit with the lowest index. Inputs and outputs are active high. 

This module is a composed of four 8 to 3 priority encoders and the logic gates necessary to connect them 
together. This module is a building block for the 128 to 7 priority encoder. 
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module enc32toSlo (1,A,ei,e0,gs); 
// epoch set_attribute FIXEDBLOCK = 1 


input [31:0] 1; 
input e1; 
output [4:0] A; 
output gs,eo; 


wire [2:0] g0A,g1A,g2A,93A; 
wire g3e0,g2e0,g 1e0,g3gs,g2¢s,2 1 gs,20gs,e0, 25; 


enc8to3lo ENCg3 (1[31:24],g3A,g2e0, e0,g3gs); 
enc8to3lo ENCg?2 (i[23:16],g2A,g 1e0,g2e0,g2gs); 
enc8to3lo ENCg1 (i[15: 8],g1A,g0e0,g leo, g1 gs); 
enc8to3lo ENCg0 (i[ 7: 0],g0A, ei,g0e0,g0gs); 


//Group Select 
stdor4 OR_A (g3gs,¢2¢s,¢1gs,20gs, gs); 


/[A4 
stdor2 OR_B (g3gs,g2gs,A[4]); 


ina. 


HA3 
stdor2 OR_C (g3gs,g¢1gs,A[3]); 


//A2- AO 

stdor4 OR_D (g0A[2],g1 A[2],22A [2],23A[2],A[2)): 
stdor4 OR_E (g0A[1],g1A[1],22A[1],23A[1],A[1]): 
stdor4 OR_F (g0A[0],21A[0],22A[0],23A [0].A[0]): 


endmodule 


6. Eight-Input, Three-Output Encoder, Priority to Low 
Bits 
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8 TO 3 ENCODER, PRIORITY LOW 
Filename: enc8to3lo.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 


Purpose: This structural model is an 8-bit input, 3-bit output priority encoder. The highest pnority is given to the 
bit with the lowest index. Inputs and outputs are active high. 


Truth table 


Inputs Outputs 
EII7 1615 14131211 10 A2 Ai AOGS EO 


Ore xx xe x 0000 
O02 020.0070" 1b Lied 
xe 10'0"0 0.00 1 1:01 0 
xxl) O 0700070 3150 11-0 
lx x x 1-0 0-070) 3150 0.150 
lee xx x 1 O- Oc Om Orie 16 
[2XeX Xe XX le OM laa) 
I) X= xexex Xx OSS OT sia) 
Lexy eee eX x: ORO 
L0oo0 00 000 000 071 
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module enc8to3lo (1,A,EILEO,GS); 
// epoch set_attn bute FIXEDBLOCK = 1 


input [7:0] I; 
input EI; 
output [2:0] A; 
output GS,EO; 


wanes 0): 

wire [2:0] A; 

wire E]_,GS3E0,EO_; 
supply! VDD; 


//Standard cell implemenation 1s more efficient here. See User Man. 5-34. 
inv #(8,0,"AUTO","1") INV1 CINO(1),.Y(_)); 
stdinv INV_AA (.INO(ED,.Y(EI_)): 


/fEnable Output 
stdand4 AND_A (EL I_[7],1_[6],1_[5],w1); 
stdand4 AND_B (I_[4],I_[3],1_[2],1_[1],w2); 
stdand3 AND_C (I_[0],wl,w2,EO); 


//Group Select ("Got Something") 
stdnor2 NOR_D (EI_,EO,GS); 


encode A2= El(17.16_15_..4 13_12_11_10_+16.15_14_ 13 _2 I1_.10_ + 


// Sige 0. +1413 12_11_.10_) 
// = led B12 10 + 16.13 _ 12 _ 1t_.10_ + 
// Pie elt 100 + 14.137 2 TI 10") 

// = EI.13_.2_.11_.10_.(17 + 16. + 15. + 14.) 


stdor4 OR_E (I[7],1(6),1[5],1[4],w5); 
stdand4 AND_F (EI,I_[3],L[2],1_[1],w6); 
stdand3 AND_FA (1_[0],w5,w6,A[2]); 


/fencode Al = EI.(17.16_.15_.14_.13_12_01_.10_ + 16.15_.14_.13_.2_.J1_.10_ + 
// eiewet e1Oe + 12.11_10_) 
// = EI.11_.10_.(17.15_.14_. + 16.1IS_.14_ + 13. + 12) 


stdand3 AND_G (][7],1I_[5],1_[4],w 10); 
stdand3 AND_H (I[6],I_[5],1_[4],w11); 
stdor4 OR_I (w10,w11,1[3],1[2],w12); 
stdand4 AND_J (ELI_[1],I_[0],w12,A[1)); 


encode AQ —EI(17.16_.15_14_ 13.2 _11_.10_ + 15.14_13_.2_1_.10_ + 
// Ieee 10) + 11.10_) 
// =ENIO_-(17.16_.14_.12_ +06.14_.2_ + 13.12_ +11) 


stdand4 AND_K (](7].1_[6],I_[4].1_[2].w15): 
stdand3 AND_L (I{5],I_[4].I_[2].w16); 
stdand2 AND_M (][3],1_[2],w17); 

stdor4 OR_N (w15,w16,w17,I[1],w18); 
stdand3 AND_P (ELI_[0],w18,A[0]); 


endmodule 


Low 


Dias Line Replacement Unit 
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LINE REPLACEMENT UNIT 
Filename: line_replacement_unit.v 
Author: Joseph R. Robert, Jr. 
Date: 21DEC95 

Revised: 13FEB96 


Purpose: This structural model determines the next line to be replaced whenever the PRC needs to start a new 
line. It first selects invalid lines. If all the lines are valid, then it selects lines that have been "aged". A priority 
encoder (ENC_1) choses the line with the lowest index among all the lines that can be replaced. If all lines are 
valid, the encoder's output enable (oe) signal is used to cause aging. A line X can be replaced if the following holds 
true for that line: 


not (X=ActiveLine) AND {not Valid[X] OR (all_lines_valid AND Aged[X])} 


Aging is accomplished by the use of a 7-bit counter (ager_counter), initially set to zero. When the 
cause_aging signal from the encoder is high, the counter advances. A decoder (DEC_B) output causes the 
appropriate Aged flag to be set. 
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module line_replacement_unit( Valid,ActiveLine,all_lines_valid, 
new_replace,fetch_done, HRESET_,CLK,ReplaceL ine); 
// epoch set_attibute FIXEDBLOCK = 1 


input [127:0] Valid; 

input [6:0] ActiveLine; 

input all_lines_valid,new_replace,fetch_done, HRESET_,CLK; 
output [6:0] ReplaceLine; 


supply1 Vdd; 

wire [127:0] wl,w2,w4,w5,w6,w7,set_,reset_,Aged,fetch_done 128, 
all_lines_valid128, HRESET128_; 

wire [6:0] ager_line,ReplaceLine, HRESET7_; 

wire ager_en,cause_aging,latch_en,latch_en_buf,ncl ,nc2; 

splitl!28 fetch_done_split (fetch_done,fetch_done128); 

splitl28 alv_split (all_lines_valid,all_lines_valid128); 

splitl28 HRESET_split (HRESET_,HRESET128_); 

split? HRESET_split7 (HRESET_,HRESET7_); 


decoder #(8,128,"verilog/dec7tol 28e.codefile","2") 

DEC_A (.SEL({ Vdd, ActiveLine[0],ActiveLine[1],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5],ActiveL ine[6] }), 
.Y(w1)); 

decoder_inv #(8,128,"venlog/dec7tol 28e.codefile”,"2") 

DEC_B (.SEL({ Vdd.ager_line[0],ager_line[1],ager_line[2], 

ager_line[3],ager_line[4],ager_line[5],ager_line[6] }),.Y BAR(set_)); 
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nand2 #(128,0,"AUTO"","1") NAND_A (.ENO(w1),.IN1(fetch_done128),.Y(w2)); 
and2 #(128,0,"AUTO","1") AND_E (.INO(w2),.IN1(HRESET128_)..Y(reset_)): 
srlatch128 AgedReg (set_,reset_,Aged); 
scntr_c #(7,0,"AUTO","1") 
ager_counter (.CLK(ager_en),.CLR(HRESET_),.EN(Vdd),.COUT(nc2), 
.Q(ager_line)); 
stdand2 AND_F (.INO(CLK),.IN1(cause_aging),. Y(ager_en)); 
nand2 #(128,0,"AUTO","1") 
NAND_B (.INO(all_lines_valid128),.IN1(Aged),. Y (w4)); 
and2 #(128.0,"AUTO","1") AND_C CINO(w4),. IN 1(Valid).. Y(w5)); 
nor2 #(128,0,"AUTO","1") NOR_D (.INO(w1),.IN1(w5),.Y(w6)): 
stdor2 OR_F (.INO(new_replace),.IN1(cause_aging),. Y(latch_en)); 
stdbuf #('19") LatchEnableBuffer (.INO(latch_en),. Y(latch_en_buf)); 
encl128to7lo ENC1 (.I(w7),.A(ReplaceLine),.e1(Vdd). 
.eo(Cause_aging),.gs(nc1)); 
latch_c #(128,0,"AUTO","1") 
ReplaceLineLatch (.EN(latch_en_buf),.CLR(HRESET_),.D(w6),.Q(w7)); 


endmodule 


Se OR Gate With 128 Inputs, One Output 
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128-INPUT OR GATE 
Filename: or128.v 

Author: Joseph R. Robert, Jr. 
Date: 21DEC95 

Revised: 23JAN96 
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module or128 (in,out); //An OR tree, equivalent to a 128-input OR gate. 
input (127:0] in; 
output out; 
wire out; 


wire [31:0] A; 
wire [7:0] B; 
wire [1:0] C; 


or4 #(32,0," AUTO","1") OR_A (.ENO(in[127:96]),.IN1(in{95:64]), 
.IN2(in{63:32]),.IN3(in[31:0]),. Y(A)); 

or4 #( 8,0,"°AUTO","1") OR_B (.INO(A[3 1:24]),.IN1(A(23:16]), 
-IN2(A[{15:8]),.IN3(A[7:0]),.Y(B)); 

or4 #( 2,0,"AUTO","1") OR_C C.INO(B[7:6]),.IN1(B[5:4]). 
.IN2(B{[3:2]),.IN3(B[1:0]),.Y(C)); 

stdor2 OR_D (.INO(C[1]),.IN1(C[0]),.Y Cout)); 


endmodule 


Se. Predicted Memory Address List 


[RRR RA ERE EA EG EE HC A FR Re He a ee ee ee 


PREDICTED MEMORY ADDRESS LIST 
Filename: predma_list.v 

Author: Joseph R. Robert, Jr. 

Patew 2 1 DEGIs 

Revised: 13FEB96 


Purpose: This structural model is a register file for storing predicted memory addresses. This list is composed of 
128 address registers, 128 equality comparators, and 128 Valid status flags. The NAR is stored in this list at the 
fetch_done pulse. If there is a match with the input address (in_addr), a priority encoder (ENC_C) determines 
which line matches. 
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module predicted_ma_list (NAR,in_addr,ActiveLine,fetch_done,flush,HRESET_, 
Valid,match_line,match); 
// epoch set_attribute FIXXEDBLOCK = 1 


input [26:0] NAR, in_addr; 

input [6:0] ActiveLine; 

input fetch_done,flush,HRESET_; 
output [127:0] Valid; 

output [6:0] match_line; 

output match; 


wire [127:0] store_en.store_en_buf,flush_enable_,set_.reset_, 
Valid,equal,.m,HRESET128_; 

wire ncl.nc2: 

supplyl Vdd; 


splitl28 hreset_splitter (.in(CHRESET_),.out(HRESET128_)); 
decoder #(8,128,"'verilog/dec7tol28e.codefile","2") 

DEC_A (.SEL({fetch_done,ActiveLine[0],ActiveLine[1],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5].ActiveLine[6] }), 
.Y¥(store_en)); 

buff #(128,0," AUTO","8") StoreEnBuffer (.INO(store_en).. Y(store_en_buf)); 
decoder_inv #(8,128," venlog/dec7to128e.codefile’,''2") 

DEC_B (.SEL({ flush.ActiveLine[0],ActiveLine[1],ActiveLine[2], 
ActiveLine[3],ActiveLine[4].ActiveLine[5], ActiveLine[6] }), 
.YBAR(flush_enable_)); 

inv #(128,0,"AUTO","1") INV_A (.INO(store_en_buf),. Y(set_)); 
and2 #(128,0,"AUTO","1") 

AND_B (.INO(flush_enable_),.IN1(HRESET128_),.Y(reset_)); 
Srlatch128 Valid_latch (.S_(set_),.R_(reset_),.Q(Valid)); 
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and2 #(128,0,"AUTO","1") 
AND_C (.INO(Valid)..IN 1 (equal),. ¥(m)); 
orl28 MATCH_OR (.in(m),.out(match)); 
enc128to7lo ENC_C (.I(m),.A(match_line),.ei(Vdd),.eo(nc 1),.gs(nc2)); 


// epoch pre_compiled addre 

addre PredMAO (NAR, in_addr,store_en_buf[0],equal[0], HRESET_); 
addre PredMAi (NAR, 1in_addr,store_en_buf[1],equal[1],HRESET_); 
addre PredMA2 (NAR, in_addr,store_en_buf[2],equal{2],HRESET_): 
addre PredMA3 (NAR, in_addr, store_en_buf[3],equal[3], HRESET_); 
addre PredMA4 (NAR, in_addr,store_en_buf[4],equal[4], HRESET_); 
addre PredMA5 (NAR, in_addr,store_en_buf[5],equal[5], HRESET_); 
addre PredMA6 (NAR, in_addr,store_en_buf[6],equal[6], HRESET_); 
addre PredaMA7 (NAR, in_addr. store_en_buf[7],equal[7], HRESET_); 
addre PredMA8 (NAR, in_addr,store_en_buf[8],equal(8], HRESET_); 
addre PredMA9 (NAR, in_addr,store_en_buf[9],equal[9], HRESET_); 
addre PredMA10 (NAR, in_addr,store_en_buf[10],equal[10], HRESET_); 
addre PredMAii (NAR, in_addr,store_en_buf[11],equal{11], HRESET_); 
addre PredMA12 (NAR, in_addr, store_en_buf[12],equal[12], HRESET_); 
addre PredMAi3 (NAR,in_addr,store_en_buf[13],equal(13], HRESET_); 
addre PredMAi4 (NAR,in_addr,store_en_buf[14],equal[14], HRESET_); 
addre PredMAi5 (NAR, in_addr, store_en_buf[15],equal[15], HRESET_); 
addre PredMA16 (NAR, in_addr,store_en_buf[16],equal[16], HRESET_); 
addre PredMAi7 (NAR, in_addr, store_en_buf[17],equal[17], HRESET_); 
addre PredMA18 (NAR, in_addr, store_en_buf[18],equal[18],HRESET_); 
addre PredMA19 (NAR. in_addr, store_en_buf[19],equal[19], HRESET_); 
addre PredMA20 (NAR, in_addr,store_en_buf[20],equal [20], HRESET_); 
addre PredMA21 (NAR,in_addr,store_en_buf[21],equal[21], HRESET_); 
addre PredMA22 (NAR, in_addr,store_en_buf[22],equal[22], HRESET_); 
addre PredMA23 (NAR,in_addr, store_en_buf[23],equal[23],HRESET_); 
addre PredMA24 (NAR, in_addr, store_en_buf[24],equal[24], HRESET_); 
addre PredMA25 (NAR, in_addr,store_en_buf[25],equal[25], HRESET_); 
addre PredMA26 (NAR, in_addr,store_en_buf[26],equal[26], HRESET_); 
addre PredMA27 (NAR, in_addr, store_en_buf[27],equal[27], HRESET_); 
addre PredMA28 (NAR, in_addr,store_en_buf[28],equal[28],HRESET_); 
addre PredMA29 (NAR, in_addr, store_en_buf[29],equal[29], HRESET_); 
addre PredMA30 (NAR, in_addr,store_en_buf[30],equal(30], HRESET_); 
addre PredMA3i (NAR,in_addr,store_en_buf[31],equal{31], HRESET_); 
addre PredMA32 (NAR, in_addr, store_en_buf[32],equal[32], HRESET_); 
addre PredMA33 (NAR, in_addr, store_en_buf[33],equal[33],HRESET_); 
addre PredMA34 (NAR, in_addr,store_en_buf[34],equal[34], HRESET_); 
addre PredMA35 (NAR,in_addr,store_en_buf[35],equal[35], HRESET_); 
addre PredMA36 (NAR, in_addr, store_en_buf[36],equal[36], HRESET_); 
addre PredMA37 (NAR, in_addr, store_en_buf[37],equal(37],HRESET_); 
addre PredMA38 (NAR, in_addr, store_en_buf[38],equal[38], HRESET_); 
addre PredMA39 (NAR, in_addr, store_en_buf[39],equal[39], HRESET_); 
addre PredMA40 (NAR, in_addr,store_en_buf[40],equal[40], HRESET_); 
addre PredMA4i (NAR, in_addr,store_en_buf[41],equal[41], HRESET_); 
addre PredMA42 (NAR, in_addr,store_en_buf[42],equal[42],HRESET_); 
addre PredMA43 (NAR,in_addr, store_en_buf[43],equal[43], HRESET_); 
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addre PredMA44 (NAR, 1in_addr,store_en_buf[44],equal[44], HRESET_); 
addre PredMA45 (NAR, in_addr,store_en_buf[45],equal[45], HRESET_); 
addre PredMA46 (NAR, in_addr,store_en_buf[46],equal[46], HRESET_); 
addre PredMA47 (NAR, in_addr,store_en_buf[47],equal[47], HRESET_); 
addre PredMA48 (NAR, in_addr, store_en_buf[48],equal [48], HRESET_); 
addre PredMA49 (NAR, in_addr, store_en_buf[49],equal[49], HRESET_); 
addre PredMAS5O (NAR, in_addr,store_en_buf[50],equal[50], HRESET_); 
addre PredMAS51 (NAR, in_addr,store_en_buf[51],equal[51], HRESET_); 
addre PredMA52 (NAR, in_addr, store_en_buf[52],equal[52], HRESET_):; 
addre PredMA53 (NAR, in_addr,store_en_buf[53],equal[53], HRESET_); 
addre PredMA54 (NAR.,in_addr,store_en_buf[54],equal[54], HRESET_); 
addre PredMAS55 (NAR, in_addr, store_en_buf[55],equal[55], HRESET_); 
addre PredMA56 (NAR, in_addr, store_en_buf[56],equal[56], HRESET_); 
addre Pred(MA57 (NAR, in_addr,store_en_buf(57],equal[57], HRESET_); 
addre PredMA58 (NAR, in_addr,store_en_buf[58],equal[58], HRESET_); 
addre PredMA59 (NAR, in_addr, store_en_buf[(59],equal(59], HRESET_); 
addre PredMA60 (NAR, in_addr,store_en_buf[60],equal[60], HRESET_); 
addre PredMA61 (NAR, in_addr,store_en_buf[61],equal(61],HRESET_); 
addre PredMA62 (NAR, in_addr,store_en_buf[62],equal[62], HRESET_); 
addre PredMA63 (NAR, in_addr,store_en_buf[63],equal[63], HRESET_); 
addre PredMA64 (NAR, in_addr,store_en_buf[64],equal[64], HRESET_); 
addre PredMA65 (NAR,1in_addr,store_en_buf[65],equal[65], HRESET_); 
addre PredMA66 (NAR, in_addr,store_en_buf[66],equal[66], HRESET_); 
addre PredMA67 (NAR,in_addr,store_en_buf[67],equal[67], HRESET_); 
addre PredMA68 (NAR. in_addr,store_en_buf[68],equal[68], HRESET_); 
addre PredMA69 (NAR, in_addr,store_en_buf[69],equal[69], HRESET_); 
addre PredMA70 (NAR, in_addr, store_en_buf[70],equal(70], HRESET_); 
addre PredMA71 (NAR,in_addr,store_en_buf[7 1],equal[71], HRESET_); 
addre PredMA72 (NAR, 1in_addr,store_en_buf[72],equal{72], HRESET_); 
addre PredMA73 (NAR, in_addr, store_en_buf[73],equal(73], HRESET_); 
addre PredMA74 (NAR, in_addr, store_en_buf[74],equal[74], HRESET_); 
addre PredMA75 (NAR, 1in_addr,store_en_buf[75],equal[75], HRESET_); 
addre PredMA76 (NAR,in_addr,store_en_buf[76],equal[76],HRESET_): 
addre PredMA77 (NAR,in_addr, store_en_buf[77],equal[77], HRESET_); 
addre PredMA78 (NAR,in_addr,store_en_buf[78],equal[78], HRESET_); 
addre PredMA79 (NAR, 1in_addr,store_en_buf[79],equal[79], HRESET_); 
addre PredMA80 (NAR, in_addr,store_en_buf[80],equal[80], HRESET_); 
addre PredMA81 (NAR,in_addr, store_en_buf[8 1],equal(81], HRESET_); 
addre Pred(MA82 (NAR, in_addr,store_en_buf[82],equal(82], HRESET_); 
addre PredMA83 (NAR, in_addr,store_en_buf[83],equal(83], HRESET_); 
addre PredMA84 (NAR, in_addr,store_en_buf[84],equal(84], HRESET_); 
addre PredMA85 (NAR, in_addr,store_en_buf[85],equal [85], HRESET_); 
addre PredMA86 (NAR, in_addr, store_en_buf[86],equal[86], HRESET_): 
addre PredMA87 (NAR, in_addr, store_en_buf[87],equal[87], HRESET_); 
addre PredMA88 (NAR,in_addr,store_en_buf[88],equal[88], HRESET_); 
addre PredMA89 (NAR, in_addr, store_en_buf[89],equal[89], HRESET_): 
addre PredMA90 (NAR, 1in_addr,store_en_buf[90],equal(90],HRESET_); 
addre PredMA91 (NAR, in_addr, store_en_buf[91].equal[91], HRESET_); 
addre PredMA92 (NAR, in_addr,store_en_buf[92],equal(92], HRESET_): 
addre PredMA93 (NAR, in_addr, store_en_buf[93],equal(93], HRESET_): 
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addre Pred(MA94 (NAR, in_addr, store_en_buf[94],equal[94], HRESET_); 

addre Pred(MA95 (NAR, in_addr,store_en_buf[95],equal[95], HRESET_); 

addre PredMA96 (NAR.in_addr,store_en_buf[96],equal[96],.HRESET_); 

addre Pred(MA97 (NAR. in_addr,store_en_buf[97],equal[97], HRESET_); 

addre PredMA98 (NAR, in_addr, store_en_buf[98].equal[98 ], HRESET_); 

addre PredMA99 (NAR, in_addr,store_en_buf[99],equal[99], HRESET_); 

addre PredMA100 (NAR, in_addr,store_en_buf[100],equal[100], HRESET_): 
addre PredMA101 (NAR, in_addr,store_en_buf[101],equal[101],HRESET_); 
addre PredMA 102 (NAR, in_addr,store_en_buf[102],equal[{102],HRESET_); 
addre PredMA 103 (NAR, in_addr,store_en_buf[103],equal[103],HRESET_); 
addre PredMA104 (NAR, in_addr,store_en_buf[104],equal [104], HRESET_): 
addre PredMA 105 (NAR, in_addr,store_en_buf[105],equal[105], HRESET_); 
addre PredMA 106 (NAR, in_addr,store_en_buf{106],equal[106], HRESET_); 
addre PredMA107 (NAR, in_addr,store_en_buf[107],equal[107], HRESET_); 
addre PredMA108 (NAR, in_addr,store_en_buf[108],equal[108], HRESET_); 
addre PredMA109 (NAR, in_addr.store_en_buf[109].equal[109], HRESET_); 
addre PredMA110 (NAR,in_addr,store_en_buf[110],equal[{110],HRESET_); 
addre PredMA111 (NAR,in_addr,store_en_buf[111],equal[111], HRESET_); 
addre PredMA112 (NAR,1in_addr,store_en_buf[112],equal[112],HRESET_); 
addre PredMA113 (NAR, in_addr,store_en_buf[113],equal[113],HRESET_); 
addre PredMA114 (NAR, in_addr,store_en_buf[114],equal[1 14], HRESET_); 
addre PredMA115 (NAR, in_addr,store_en_buf[115],equal{115], HRESET_); 
addre PredMA116 (NAR, in_addr,store_en_buf[116],equal[116],HRESET_); 
addre PredMA117 (NAR, in_addr,store_en_buf[117],equal[117], HRESET_); 
addre PredMA118 (NAR, in_addr,store_en_buf[118],equal[118], HRESET_); 
addre PredMA 119 (NAR, in_addr, store_en_buf[119],equal[{1 19], HRESET_); 
addre PredMA120 (NAR,in_addr,store_en_buf[120],equal[120],HRESET_); 
addre PredMA121 (NAR,in_addr,store_en_buf[121],equal[121],HRESET_); 
addre PredMA 122 (NAR.in_addr,store_en_buf[122],equal[122],HRESET_); 
addre PredMA123 (NAR, in_addr,store_en_buf[123],equal[123], HRESET_); 
addre PredMA124 (NAR,in_addr,store_en_buf[124],equal[124],HRESET_); 
addre PredMA125 (NAR, in_addr,store_en_buf[125],equal[125], HRESET_); 
addre PredMA126 (NAR, in_addr,store_en_buf[126],equal[126], HRESET_); 
addre PredMA127 (NAR, in_addr,store_en_buf[127],equal[127], HRESET_); 


endmodule 


10. One-to-128 Wire Splitter 
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1 TO 128 WIRE SPLITTER 

Filename: splitl28.v 

Author: Joseph R. Robert, Jr. 

ate: 2DEC9I5S 

Revised: 23JAN96 


Purpose: Splits a wire into 128 wires. 
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module split128 (in,out); //Splits a wire into 128 wires. 
input 1n; 
output [127:0] out; 


assign out = {iN,in,in,in,in,in,in,in,in,in,in,in,in,in,in,in, 
in,in,1n,in,in,1N,1N, iN, 1N,1N,in,in,in, in, in, in, 
in,in,in,in,in,in,in, in, in,in,in,in,in,in,in,in, 
1N,1N,1n,1N,1N,1n,in,1N,1n,1n,10,1n,in,in,in,in, 
in,in,1N,1N,1N,1N,iN,1n,in,in,1n,in,in,in,in,in, 
in,1N,1N,1N,1N,1N,1N,1N,1N,1N,1n,1N,1N,1n, in, 1n, 
in,in,in,ln,in, N,N, in, in, in,in,in, in, in, in, in, 
1n,1N,in,1N,1n,1n,1n,in,1N,1n,lN,in,1N,1n,1n, in }; 
endmodule 


11. One-to-Seven Wire Splitter 
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1 TO 7 WIRE SPLITTER 
Filename: split7.v 

Author: Joseph R. Robert, Jr. 
Date? -2IDEG®5 

Revised: 23JAN96 


Purpose: Splits a wire into 7 wires, 
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module split7 (in,out); //Splits a wire into 7 wires. 
input in; 
output [6:0] out: 


assign Out = {in,in,in,in,in,in,in }; 


endmodule 
he Set, Reset Latch 
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STANDARD SET,RESET LATCH 
Filename: srlatch.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 23JAN96 


164 


Reset has priority. 
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module srlatch (S_, R_, Q); 
mot SK; 
output Q; 
wire W1,w2.Q,Q_; 


stdnand2 NAND_A (.INO(w2),.IN1(Q_),.Y(Q)); 
stdnand2 NAND_B (.INO(R_),.IN1(Q),. Y(Q_)); 
stdnand2 NAND_C (.INO(w1),.IN1(R_),. Y(w2)); 
stdinv INV_D (.INO(S_),. Y(w1)); 


endmodule 


13. Set, Reset Latch Array 128 Bits Wide 
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ARRAY OF 128 SET,RESET LATCHES 
Filename: srlatch128.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 23JAN96 
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module srlatch128 (S_, R_, Q); //set-reset latch 
input [127:0] S_,R_: 
output [127:0] Q; 
wire [127:0} wl,w2,Q,Q_; 


nand2 #(128,0,"AUTO","1") NAND_A (.INO(w2),.IN1(Q_),.Y(Q)); 
nand2 #(128,0,"AUTO","1") NAND_B (.INO(R_),.IN1(Q),.Y(Q_)): 
nand2 #(128,0,"AUTO","1") NAND_C (.INO(w1),.IN1(R_),-Y(w2)): 
inv #(128,0,"AUTO","1") INV_C (.INO(S_),.Y(w1)): 


endmodule 


E PREDICTOR 
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PREDICTOR 

Filename: predictor.v 
Author: Joseph R. Robert, Jr. 
Bates «21 DECY5 


Revised: O6FEB96 


Purpose: This module calculates the Next Address (stored in NAR) based on the Most Recent Memory Access 
(MRMA) and the Current Address (in the CAR). The prediction calculation 1s 


NAR = 2*CAR - MRMA 


In this structural implementation of the Predictor, the predict signal latches in the CAR and MRMA inputs. The 
subtraction is accomplished as a 2's compliment addition with a high speed adder. 


The CAR is multiplied tmes 2 by concatenating a zero at the least significant end. The most significant bit of the 
CAR is not retained, since it will not have an effect on the 27-bit output of the adder. This would adversely affect 
address prediction only around the mid-point of the 4 gigabytes of memory. The Golden Rule here 1s "Design for 
the common case." 


A number is negated in 2's compliment by inverting all the bits and adding 1. The MRMA is negated by inverting 
all its bits. Adding the required 1 is implemented as a Carry-In to the adder. 


Epoch's TACTIC reported the propagation delay from predict to NAR to be 4.90 ns. 
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module predictor (MRMA,CAR, predict, NAR,HRESET_); 
//CAR is [30:5] of 32-bit address 
/[MRMA and NAR are [31:5] of 32-bit address 


// epoch set_attribute FIXEDBLOCK = 1 
input [26:0] MRMA; 

input [25:0] CAR; 

input predict, HRESET_; 

output [26:0] NAR; 


wire [26:0] NAR,A.B,C; 
wire Nc; 


“define group “predictor” 


supply0 gnd; 
supply1 vdd; 


assign A[O] = gnd; 
dff_c #(26,1. group,"1") 
CAR_latch (.D(CAR),.CLK(predict),.CLR(HRESET_),.Q(A[26:1])); 


dff_c #(27,1, group,"1") 
MRMA_ latch (.D(MRMA),.CLK(predict),.CLR(HRESET_)..Q(C)); 


bufinv #(27,1, group,"1",""speed") 
bit_complement (.INO(C),. Y(B)); 
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addhs #(27,1, group,"1") 
adder (.A(A),.B(B),.CIN(vdd),.COUT(nc),.SUM(NAR)): 


endmodule 


Be. DATA LIST 


ata PR OR IEE 76 9 OR 2 i ake A Ae Ae CO 2 A ee 20 2 ek he ee i eke 2 he kee ek ee ae 2 He 2c 2 ie 2 eke ee Ae 2 2 2h 2 He 2 2 


DATA LIST 

Filename: datalist.v 

Author: Joseph R. Robert, Jr. 
Date: I5SDEC95 

Revised: O7FEB96 


Purpose: This module stores the data retreived from memory in anticipation of a request by the CPU. 


The basic memory cell is Epoch's hsramoe (high speed ram with output enable). Since each hsram has 
a maximum word size of 128 bits, there are two hsram parts in parallel to get the required 256-bit width. 

An upload signal causes the Data List to store the data on data_line into the address specified by 
ActiveLine. The input upload has to be inverted to match the active-low WR input of the Epoch hsram component. 

A download signal causes the Data List to assert onto data_line the data in the address specified by 
ActiveLine. This signal also has to be inverted for the same reason. 

Both the inverters can probably be removed if the Bus Interface Unit makes the upload and download 
signals active low. That could only improve the response time of this data memory. 

Epoch calculated the following tuming delays: 


download -> hsramoe.DOUT 2.3 ns 
ActiveLine -> hsramoe.DOUT 7.3 ns 


A design alternative is to use the regular speed version, ramoe, with the following timing delays. 


download -> ramoe.DOUT 4ns 
ActiveLine -> ramoe.DOUT 16 ns 


Using this slower RAM is possible, but would require a significant modification to the PRC behavior to handle to 
longer delay, and would add a cycle delay to CPU reads when there is a hit in the PRC. 

Putting this module's VerilogOut file into the orginal PRC behavioral model for mixed-mode simulation 
caused a timing error that had to be corrected in the Bus Interface Unit. After an upload to the DataList, data_line 
must remain valid for long enough to meet the data hold tume requirement of Epoch’s hsramoe. 
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module datalist (data_line,ActiveLine,upload.download); 
// epoch set_attribute FIXEDBLOCK = 1 


input [6:0] ActiveLine; 
input upload,download; 


Ge, 


inout [255:0] data_line; 


wire [255:0] data_line; 
wire write_,enable_; 


STRUCTORE 
stdbufinv upload_inv (.INO(upload),. ¥Y(write_)); 
stdbufinv download_inv (.INO(download),. Y (enable_)); 


hsramoe #(128,128,7,32,1,"1") 
data_raml (.A(ActiveLine),.DIN(data_line[127:0]),.DOUT(data_line[127:0]), 
.WR(write_),.OE(enable_)); 
hsramoe #(128,128,7,32,1,"1") 
data_ram0 (.A(ActiveLine),.DIN(data_line[255:128]),. DOUT(data_line[255:128]), 
.WR(write_),.OE(enable_)); 


endmodule 


G. BUS INTERFACE 
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* BUS INTERFACE UNIT 
* Filename: bus_interface.v 
* Author: Joseph R. Robert, Jr. 
* Date: OIOCT9IS 
* Revised: ZOMAR96 
Purpose: This module connects the PRC with the system bus. It handles the protocol of data transfer in and out 
of the PRC. 

When this module receives a fetch signal, it latches the address in the NAR, and requests the bus for a 
burst read. It stores the incoming data until all four bursts have been received. Then it uploads the data into the 
Data List and assserts fetch_done. If there is a parity error during the fetch, the Bus Interface informs the Controller 
by asserting fetch_abort, and the transaction is cancelled. 

When this module receives a send signal, it sends a cancel signal (CANX) to the memory module, 
downloads data from the Data List, and then sends the data to the CPU. When the transfer is finished, it asserts 
send_done. 

The coordination of these activities 1s accomplished through the use of two Finite State Machines. One 


acts as an address bus master, and the other controls the flow of data. 
* 
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module bus_interface (NAR_IN,BURSTSTART,BG_,AACK_,DBG_.send,fetch, 
clk, BR_,upload,download,fetch_done,fetch_abort, 
send_done,CANX,snoop_ignore, DATALINE,D,A,AP,DP,DPE_, 
TT, TSIZ,TC.,ABB -IS2#BSTeDBB. TA HRESET |): 
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// epoch set_attnbute FIXEDBLOCK = 1 


// Signals are defined in system.v. 

input [26:0] NAR_IN; 

input [1:0] BURSTSTART; 

input BG_,AACK_,DBG_,send,fetch,clk,HRESET_; 
output BR_,upload,download,fetch_done,fetch_abort; 
output send_done,DPE_,CANX,snoop_ignore; 

inout [255:0] DATALINE; 

inout [63:0] D; 

inout [31:0] A; 

inout [7:0] DP; 

inout [4:0] TT; 

inout [3:0] AP; 

inout (2:0) TSIZ: 

inout [1:0] TC; 

inout ABB_,TS_,TBST_,DBB_,TA_; 


tri [255:0] DATALINE; 

11f63:0] D: 

tri [31:0] A; 

ine .0 DP: 

mao) 11: 

tri [3:0] AP; 

rt (2:0 [2s 

tri [1:0] TC; 

tri ABB_,TS_,TBST_,DBB_,TA_,DPE ; 


supplyl VDD; 
supplyO GND; 


//Address section wires 

wire [26:0] a_reg,NAR; 

wire [3:0] ap_reg,addr_parity_gen:; 
wire qual_BG_; 


//Data section wires 

wire [255:0] data,mux_out; 

wire [31:0] dpanty,dparity_gen; 

/{wire [3:0] dreg_clk; 

wire [1:0] burst_start; 

wire bs_clk,dregQ_clk,dreg1_clk,dreg2_clk,dreg3_clk,data_parity_error,qual_DBG_, 
dregO_clk_buf,dreg]_clk_buf,dreg2_clk_buf,dreg3_clk_buf,a_en_buf_,CANX, 
dataline_en_buf_,d_enO_buf_,d_enl_buf_,d_en2_buf_,d_en3_buf_,ta, 
latchO_delay,latch1_delay,latch2_delay,latch3_delay; 


//ADDRESS BUS INTERFACE 


assign qual BG_ =~(ABB_ & !BG_); 
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// Next Address Register 
dff #(27,0,"AUTO","AUTO") 
NextAddressReg (.CLK(NARLatch),.D(NAR_IN)..Q(NAR)); 


//Generate address parity. 
parityo_gen32 
AddrParityGen (.D({ NAR,GND,GND,GND,GND,GND}),.PGEN(addr_parity_gen)); 


//Address Output Registers and buffers 
dff #(27,0,"AUTO","AUTO") 
AddressReg (.CLK(a_latch),. DONAR),.Q(a_reg)); 
dff #(4,0,"AUTO"," AUTO") 
AddrParReg (.CLK(a_latch),.D(addr_parity_gen),.Q(ap_reg)); 
tribuf #(32,0,"AUTO","AUTO") 
a_buffer (.EN(a_en_buf_),.INO({ a_reg, GND,GND,GND,GND,GND })..Y(A)); 
stdbuf #("9") AEN_BUF (.INO(a_en_),.Y(a_en_buf_)); 
tribuf #(4,0,"AUTO","AUTO") 
ap_buffer (.EN(a_en_),.INO(ap_reg),. Y(AP)); 
tribuf #(5,0,",AUTO","AUTO") 
tt_buffer (.EN(a_en_),. INO({GND, VDD, VDD, VDD,GND}),. Y(TT)); 
tribuf #(3,0,",AUTO","AUTO") 
tsize_buffer (.EN(a_en_),.INO({ GND, VDD,GND}),.Y(TSIZ)); 
tribuf #(2,0,"AUTO","AUTO") 
tcode_buffer (.EN(a_en_),.INO({GND,GND}),.Y(TC)): 


stdtribuf abb_buffer (.EN(abb_en_),.INO(abb_reg_),.Y(ABB_)); 
stdtribuf tbst_buffer (.EN(tbst_en__.),. INO(GND),.Y(TBST_)); 
Stdtribul ts buifer (2NCssen. ); INOUsmres.) Y(1s_)): 


//ADDRESS FINITE STATE MACHINE 


parameter // epoch enum astat 


AIDLE =3'd0, 
WAIT_FOR_BG = 3'd1, 
MASTER = 3'd2, 


TRANSFER = 3'd3, 
WAIT_FOR_AACK = 3'd4, 
TERMINATION =3'd5, 

W AIT_FOR_NOT_FETCH = 3'd6, 
dc_astate = 3bxx; 


reg [2:0] /* epoch enum astat */ astate, next_astate; 
reg a_latch,a_en_,abb_reg_,abb_en_,BR_,NARLatch,snoop_ignore, 


— —? 
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always @(posedge clk or negedge HRESET_) 
begin 
if (SHRESET_) 
astate = A_IDLE; 


L7G 


else 
astate = next_astate; 
end 


always @(astate or fetch or qual_BG_ or AACK_) 


begin 
//default values 
amvaich 6—s = _ 10; 
a_en_ =o) Bl: 
abb_reg_ = l1'bl; 
appeen —1Dbi!; 
BR_ = Jl ole 


NARLatch = 1'b0; 
snoop_ignore = 1'b0; 


sien. =1 bl; 
ts_reg. =l1'bl; 
tS_en_ = Ibi: 


case (astate) 


Ae IDLE: 
begin 
if (fetch) 
next_astate = WAIT_FOR_BG; 
else next_astate = A_IDLE; 


end 
WAIT_FOR_BG: 
begin 
BR_ = 1'b0; // Request the bus. 


NARLatch = 1'bl; // Latch the Next Address. 
if (qual_BG_ == 1'b0) 

next_astate = MASTER; 
else next_astate = WAIT_FOR_BG; 


end 
MASTER: 

begin 
a_latch =1'bl; // Latch transfer attributes. 
jecie = 1'b0; // Enable attribute outputs. 
abb_reg_ = 1'bO; // Take the address bus. 
abb_en_ = 1'b0; 
snoop_ignore = 1'b1; // Tell snooper to ignore this transaction. 
tbst_en_ = 1'bO; // Another transfer attribute. 
ts_reg_ =1'bO; // Start the transfer. 
ts_en_ = |'bO;: 
next_astate = TRANSFER; 

end 


Be ty 


begin 
a_en =o: 
abb_reg_ =1'b0 
abbuenea = 1 DW: 


snoop_ignore = 1'b1; 
(bS(cen. “= 1s 
is-rec = Il; 
(Sen =n bu): 
if (AACK_ = 1'bl) 
next_astate = WAIT_FOR_AACK; 
else next_astate = TERMINATION; 
end 


WAIT_FOR_AACK: 


begin 
a_en_ = 1 60: 
Abburetanae— 1 OU): 
abblen a. = bu; 
snoop_ignore = 1'b1; 
tbstzen= ~= 160: 


if (AACK_ = 1'bl) 
next_astate = WAIT_FOR_AACK;: 
else next_astate = TERMINATION; 


end 
TERMINATION: 
begin 
abb_reg_ = 1'bl; //Relinquish the address bus. 
Aabbucnes = bu: 
next_astate = WAIT_FOR_ NOT_FETCH: 
end 


WAIT_FOR_NOT_FETCH: 
begin 
ie etch ——"1 bi) 
next_astate = WAIT_FOR_NOT_FETCH; 
else next_astate = A_IDLE; 
end 


default: 
begin 
Next astaie = dc astale: 
end 
endcase 


end 


/[DATA BUS INTERFACE 


her 


assign qual DBG_ = ~(DBB_ &!DBG_); 


// burst_start latch 
stdand2 BS_AND (.INO(send),.IN 1(clk),. Y(bs_clk)): 
dff #(2,0,",AUTO","AUTO") 
BurstStartReg (.CLK(bs_clk),.D(BURSTSTART),.Q(burst_start)); 


/{ Odd Parity Generator/Checker 

// epoch pre_compiled pantyo_chkgen256 

parityo_chkgen256 DataParityGen 
(.D(data),.PIN(dparity),. ERROR(data_parity_error),.PGEN(dparity_gen)); 
assign DPE_ = ~data_parity_error; 


//data registers 

stdbufinv TA_INV (.INO(TA_),. Y(ta)); 

//Delay buffer required for timing of latch signals. Gates = 4 results in smallest layout area. 
stddelaybuf #(1,4,"AUTO") LatchDelay0(.INO(atchO),. Y(latchO_delay)); 

stddelaybuf #(1,4,"AUTO") LatchDelay1(.INO(latch1),.Y(latch1_delay)); 

stddelaybuf #(1,4,"AUTO") LatchDelay2(.INO(latch2),. Y(latch2_delay)); 

stddelaybuf #(1,4,"AUTO") LatchDelay3(.INO(latch3),. Y(latch3_delay)); 

stdand3 #("CRITICAL") 

DRO_AND (.INO(clk),.IN 1(latchO_delay),.IN2(ta),. Y(dreg0_clk)); 
stdand3 #("CRITICAL") 

DR1_AND (.INO(clk),.IN1(latch1_delay), .IN2(ta),. Y(dreg1_clk)); 
stdand3 #("CRITICAL") 

DR2_AND (.INO(cik),.IN1(latch2_delay),.IN2(ta),. Y(dreg2_clk)); 
stdand3 #("“CRITICAL") 

DR3_AND (.INO(clk),.IN1(latch3_delay),.IN2(ta),. Y(dreg3_clk)); 
stdbuf #("CRITICAL") DRO_BUF (.INO(dreg0_clk),. Y(dregO_clk_buf)); 
stdbuf #("“CRITICAL") DR1_BUF (.INO(dreg 1t_clk),. Y(dreg 1_clk_buf)); 
stdbuf #("CRITICAL") DR2_BUF (.INO(dreg2_clk),. Y(dreg2_clk_buf)); 
stdbuf #("CRITICAL") DR3_BUF (.INO(dreg3_clk),.Y(dreg3_clk_buf)); 
dff #(72,0,"AUTO"","A UTO") 

DataRegO (.CLK(dregO_clk_buf),.D({mux_out[ 63: 0],DP}), 
.Q({ data[ 63: O],dpanty[ 7: 0] })); 
dff #(72,0,"AUTO"","AUTO") 
DataReg! (.CLK(dreg1_cik_buf),.D({mux_out[127: 64],DP}), 
.Q({data[127: 64], dparity[15: 8] })); 
dff #(72,0,"AUTO"","AUTO") 
DataReg2 (.CLK(dreg2_clk_buf),.D({ mux_out[191:128],DP}), 
.Q({ data[191:128],dparity[23:16]})); 
dff #(72,0,"AUTO"","AUTO") 
DataReg3 (.CLK(dreg3_clk_buf),.D({ mux_out[255:192],DP}), 
.Q({ data[255:192],dparity [31:24] })); 


//multiplexer 
mux2 #(128,0,"AUTO","AUTO") 

MUXA (.INO({D,D}),.IN1(DATALINE[127: 0]),.SO(mux_sel),.Y(mux_out[{127: 0])); 
mux2 #(128,0,"AUTO","AUTO") 

MUXB (.INO({ D,D}),.IN1(DATALINE[255:128]),.SO@mux_sel),. Y(mux_out[255:128])); 


dhs 


//dataline output buffer 
stdbuf DATALINE_EN_BUFFER (.INO(dataline_en_),. Y(dataline_en_buf_)); 
tribuf #(128,0,"AUTO"."AUTO") 
dataline_bufferA (.EN(dataline_en_buf_)..INO(data[127: 0]), 
~.Y(DATALINE[127: 0])); 
tnibuf #(128,0,",AUTO"","AUTO") 
dataline_bufferB (.EN(dataline_en_buf_),.INO(data[255:128}), 
.Y(DA TALINE[255:128})); 


//data output buffers 
tribuf #(64,0,"AUTO","AUTO") 

data_bufferO (EN(d_enO_buf_),.INO(data[ 63: 0])..Y(D)); 
tribuf #(64,0,"AUTO""," AUTO") 

data_bufferl (.EN(d_en1_buf_),.INO(data[127: 64}),.Y(D)): 
tribuf #(64,0,"AUTO",” AUTO") 

data_buffer2 (EN(d_en2_buf_),.UNO(data[191:128}),.Y(D)); 
uibuf #(64,0,"AUTO","AUTO") 

data_buffer3 (.EN(d_en3_buf_),.INO(data[255:192]}),.Y(D)); 


stdbuf DENO_BUF (.INO(d_en0O_),.Y(d_en0O_buf_)); 
stdbuf DEN1_BUF (.INO(d_en1_),. Y(d_en1l_buf_)); 
stdbuf DEN2_BUF (.INO(d_en2_),.Y(d_en2_buf_)); 
stdbuf DEN3_BUF (.INO(d_en3_),. Y(d_en3_buf_)):; 


uibuf #(8.0,"AUTO","AUTO") 

dparity_bufferO (.EN(d_enO_),.INO(dparity_gen[ 7: 0}),.Y(DP)): 
tribuf #(8.0." AUTO"","AUTO") 

dparity_bufferl (.EN(d_en1_),.INO(dparity_gen[15: 8]),. Y(DP)); 
tribuf #(8,0,"AUTO","AUTO") 

dparity_buffer2 (EN(d_en2_),.INO(dparity_gen[23:16]}),.Y(DP)); 
tribuf #(8,0."AUTO","AUTO") 

dparity_buffer3 (.EN(d_en3_),.INO(dparity_gen[31:24]),.Y(DP)); 


stdtribuf dbb_buffer (.EN(dbb_en_),.INO(dbb_reg_),.Y(DBB_)): 
stdtribuf ta_buffer (-EN(ta_en_),. INO(GND),.Y(TA_)); 
stdbuf #("26") CANX_BUF (.INO(cancel),. Y(CANX)); 


/[DATA FINITE STATE MACHINE 


parameter // epoch enum dstat 
DeIbDLE = 3'd0, 
WAIT_FOR_DBG =S5'dl, 
FIKS@ BEA ee) 02. 
SECONDEEEAT = 5d3; 
THIRD_BEAT  =5'd4, 
FOURTHER EAT — do: 
FETCH_TERMINATE = 5'd6, 
UPLOAD1 = 5'd7, 
ABORT} = 3'd8, 
D_WAIT_FOR_NOT_FETCH_A = 5'd9, 


Diy IISEORSNOT_ FETCH B=5d10. 


SLARKTeSeEND © =5'd12, 
SENDOO =a. 
SENDO1 = 5'd14, 
SENDO2 = Gl lsy. 
SENDO3 = Spc ley 
SEND10 == yee 
SEND11 = S018, 
SEND12 = 09. 
SEND13 =) ae O, 
SEND20 = scl 
SEND21 = 022, 
SEND22 = 023: 
SEND23 = 5'd24, 
SEND30 =e 
SEND31 = c020- 
SEND32 =id2 75 
SEND33 = eG. 
SEND_TERMINATE = 5'd29, 
dc_dstate =") Diex; 


reg [4:0] /* epoch enum dstat */ dstate, next_dstate; 

reg cancel,dbb_reg_,dbb_en_,dataline_en_,d_enO_,d_enl_,d_en2_.d_en3_, 
download.fetch_done, 
fetch_abort,latchO, latch 1 latch2,latch3,mux_sel.send_done,upload,ta_en_; 


always @(posedge clk or negedge HRESET_) 
begin 
if (HRESET_) 
dstate = D_IDLE; 
else 
dstate = next_dstate; 
end 


always @(dstate or fetch or send or qual_DBG_ or TA_ or 
data_parity_error or burst_start) 


begin 
//default values 

cancel = 1'b0; 
Gbb-reg = == 1'bl; 
diebeena = 1 bi; 
dataline_en_ = l’bl; 
deenO ) =I bl: 
deen | = Ibi: 
Geena 6s = | bls 
eens. =1b!: 


download = 1'b0; 
fetch_done = 1'b0; 
fetch_abort = 1'b0; 


Lae 


latchO =a: 


latch = UbG: 
latch2 = Pb: 
latch3 = PbO: 
mux_sel =1'bd0; 
send_done = 1'b0; 
lS Selene 
Upload sey = lebU; 


case (dstate) 


D_IDLE: 
begin 
if (fetch) 
next_dstate = WAIT_FOR_DBG: 


else if (send) next_dstate = START_SEND; 


else next dstate = D aI DEE: 
end 


WAIT_FOR_DBG: 
begin 
if (qual_DBG_==1'b0) 
next_dstate = FIRST_BEAT; 
else next_dstate = WAIT_FOR_DBG; 
end 


FIRST_BEAT: 
begin 
dbbarez S— 1 bu; 
dbb_en_ =1'b0; 
latchO'= 1} bi; 
1 As==i bh) 


next_dstate = FIRST_BEAT; 
else next_dstate = SECOND_ BEAT: 
end 


SECOND_BEAT: 


begin 
dbburecs = 1 bi: 
dbbwenas— lbe- 
niche =) bie 
if (Ae —— tbl) 


next_dstate = SECOND_BEAT: 
else next_dstate = THIRD_BEAT: 
end 


THIRD_BEAT: 
begin 
dbpsreg= = 60: 
dbb_en_ = 1'b0; 


Le 


latch? = 1 bi: 
foo —— 11) 
next_dstate = THIRD BEAT; 
else next_dstate = FOURTH_BEAT; 
end 


FOURTH_BEAT: 


begin 
ibbercs. = 160; 
dbb_en_ = 1'b0; 
latch = 1 bi; 
mela == 1 b1)} 


next_dstate = FOURTH_BEAT; 
else next_dstate = FETCH_TERMINATE; 
end 


FETCH_TERMINATE: 


begin 
fopores. — Ibi: 
dbb_en_ = 1'b0; 


if (data_parity_error == 1'b1) 
next_dstate = ABORT1; 
else next_dstate = UPLOAD1; 
end 


UPLOAD1: 
begin 
dataline_en_ = 1'b0: 
fetch_done = 1'b1;: 
upload = 1'bi1; 
next_dstate = D_WAIT_FOR_NOT_FETCH_A: 
end 


ABORT: 
begin 
fetch: abort = |’bl: 
next_dstate = D_WAIT_FOR_NOT_FETCH_B; 
end 


D_WAIT_FOR_NOT_FETCH_A: 
begin 
dataline_en_ = 1'b0; // To meet data hold requirements of hsram 
// in Data List. 
felenadone = bl: 
ie tetch == 1 bl) 
next_dstate = D_WAIT_FOR_NOT_FETCH_A;: 
else next_dstate = D_IDLE; 
end 


D_WAIT_FOR_NOT_FETCH_B: 


thy 


begin 
fetch_abort = l’bl: 
i detch == 1b 
next_dstate = D_WAIT_FOR_NOT_FETCH_B: 
else next_dstate = D_IDLE; 
end 


START_SEND: 

begin 
Cancel — I pl: 
download = I'bl; 
latenO — 1 bil: 
latch =l bi: 
late bt: 
lateh3 =i1 bl; 
mux sel= Tpit: 
if (burst_start == 2'dO) next_dstate = SENDOO; 
else 1f (burst_start == 2'd1) next_dstate = SEND11; 
else 1f (burst_start == 2'd2) next_dstate = SEND22; 
else 1f (burst_start == 2'd3) next_dstate = SEND33; 
else next_dstate = START_SEND; 

end 


SENDOO: 
begin 
fa en lO: 
den. = 1 b0: 
next_dstate =SENDOI;: 
end 


SENDOl: 
begin 
fazen = 1 bO: 
d_enl_ = 1'b0; 
next_dstate = SENDO2; 
end 


SENDO?: 
begin 
mua — ate 
duen2 = BO: 
next_dstate = SENDO3; 
end 


SENDO3: 
begin 
ta_en_ = 1'b0; 
dens 2 = 1 bO: 
send_done = l'bl; 
next_dstate = SEND_TERMINATE; 
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end 


SEND 11: 
begin 
ta_en_ = 1'b0; 
deenl = 1'p0: 
next_dstate = SEND 12; 
end 


SEND12: 
begin 
teens = 1 b0: 
deenge = |b; 
next_dstate = SEND 13; 
end 


SEND13: 
begin 
fa_emm= 1'b0: 
deen3s. = 1'b0: 
next_dstate = SEND 10; 
end 


SEND 10: 
begin 
ta_en_ = 1'b0; 
deenue = |'bO; 
sencauone — | bl; 
next_dstate = SEND_TERMINATE:; 
end 


SEND22: 
begin 
tawen = = | b0: 
deen. — 1 bl: 
next_dstate = SEND23; 
end 


SEND23: 
begin 
ia en 2 =skbO- 
deeno. = 1) BO! 
next_dstate = SEND20; 
end 


SEND20: 
begin 
fase =) b0: 
deenon — 1'b0: 
next_dstate = SEND21; 


eS 


end 


SEND2Zi 
begin 
tacen. = 1 b0: 
d_enl_ = 1'b0; 


send_done = l'bl: 
next_dstate = SEND _ TERMINATE: 
end 


SEND33: 
begin 
ta_en_ = 1'b0; 
dven3 -=1)b0: 
next_dstate = SEND30; 
end 


SEND30: 
begin 
ta_en_ = 1'b0; 
d_en0_ = 1'b0: 
next_dstate = SEND31; 
end 


SEND31: 
begin 
facen = 1b0: 
d_eni_ = 1'b0; 
next_dstate = SEND372; 
end 


SEND3?2: 
begin 
ta_en_ = 1'b0; 
dien2. 2 = le: 
send_done = 1'bl; 
next_dstate = SEND_TERMINATE; 
end 


SEND_TERMINATE: 
begin 
next_dstate = DUIDEE; 
end 


default: 
begin 
next_dstate = dc_dstate; 
end 


endcase 
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end 


endmodule 


se Odd Parity Checker/Generator With 256 Inputs 


Nese ta er a EO th MR RE eR OR EK Ee OE Re ee 


* ODD PARITY CHECKER AND GENERATOR 

* Filename: panityo_chkgen256.v 

* Author: Joseph R. Robert, Jr. 

Sate; 29FEBIG 

* Revised: 29FEB96 

* 

Purpose: This module checks the panty of the input data, comparing it to the input parity. Parity is odd including 
the panty bit. This module also generates the parity for the input data in groups of eight input bits. 


mes EEE EE EE Ee Ek Ee Ee Re Ee EK eK RE EE | 


module parityo_chkgen256 (D,PIN,ERROR,PGEN); 
// epoch set_attribute FIXEDBLOCK = 1 


mput [255:0] D; 
input [31:0] PIN; 
output [31:0] PGEN; 
output ERROR; 


wire ERROR_0,ERROR_1i,ERROR_2,ERROR_3,ERROR; 


parityo_chk64 panty_group_0 

(.D(D[ 63: 0]),.PIN(PIN[ 7: 0]),. ERROR(ERROR_0O),.PGEN(PGEN[ 7: 0])); 
parityo_chk64 parity_group_1l 

(.D(D[127: 64]),.PIN(PIN[15: 8]),. ERROR(ERROR_1),.PGEN(PGEN[15: 8])); 
parityo_chk64 parity_group_2 

(.D(D[191:128]),.PIN(PIN[23:16]),.ERROR(ERROR_2),.PGEN(PGEN[23:16])); 
parityo_chk64 panty_group_3 


(.D(D[255:192]),.PIN(PIN[3 1:24]), ERROR(ERROR_3),.PGEN(PGEN (3 1:24])); 


stdor4 OR_A (ERROR_0,ERROR_1,ERROR_2,ERROR_3,ERROR); 


endmodule 
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module parityo_chk64 (D,PIN,ERROR,PGEN); 


Toe 


input [63:0] D; 
input [7:0] PIN; 
output [7:0] PGEN; 
output ERROR; 


wire ERROR_0,ERROR_1,ERROR_2,ERROR_3,ERROR_4,ERROR_S,ERROR_6.ERROR_7,ERROR_A, 
ERROR_B,ERROR; 


panitycgo #(8,0,"AUTO","1") 

parity_group_0 (.D(D[ 7: 0]),.PIN(PIN[0])..ERROR(ERROR_O)..PGEN(PGEN[0])); 
paritycgo #(8,0,"AUTO","1") 

parity_group_1 (.D(D[15: 8]),.PEN(PIN[1]),.ERROR(ERROR_1),.PGEN(PGEN[1])): 
paritycgo #(8,0,"AUTO"," 1") 

parity_group_2 (.D(D[23:16]),.PIN(PIN[2]),,ERROR(ERROR_2),.PGEN(PGEN[2])); 
paritycgo #(8,0,"AUTO","1") 

parity_group_3 (.D(D[31:24]),.PIN(PIN[3]),, ERROR(ERROR_3),.PGEN(PGEN[3])); 
paritycgo #(8,0,"AUTO","1") 

parity_group_4 (.D(D[39:32]),.PIN(PIN[4]),. ERROR(ERROR_4),.PGEN(PGEN[4])); 
paritycgo #(8,0,"AUTO","1") 

parity_group_5 (.D(D[47:40]),.PIN(PIN[5]),. ERROR(ERROR_5),.PGEN(PGEN[5])); 
paritycgo #(8,0,"AUTO","1") i 

parity_group_6 (.D(D[55:48]),.PIN(PIN[6]),. ERROR(ERROR_6),.PGEN(PGEN[6])); 
paritycgo #(8,0,"AUTO","1") 

parity_group_7 (.D(D[63:56]),.PIN(PIN[7]),. ERROR(ERROR_7),.PGEN(PGEN[7])); 


stdor4 OR_A (ERROR_0O,ERROR_1,ERROR_2,ERROR_3,ERROR_A); 
stdor4 OR_B (ERROR_4.ERROR_5,ERROR_6.ERROR_7,ERROR_B); 
stdor2 OR_C (ERROR_A,ERROR_B,ERROR); 


endmodule 
2. Odd Parity Generator With 32 Inputs 


[78 Pe Ae 2 ee Re i ke 2 2 2 Ae EO Ae 2 AG eR oh 2 Oe 0 AG A 2 Ae Oe ete oe ee ke oe He ie Ae PG I ee 


* ODD PARITY GENERATOR 
* Filename: parityo_gen32.v 

* Author: Joseph R. Robert, Jr. 
* Datel Zee BIG 

* Revised: 29FEB96 


x 


Purpose: This module generates odd parity bits for group of eight inputs. 


Fe AE AE EE AR ER REE HE EAE AE EEE EE eee Oe ee 


module parityo_gen32 (D,PGEN); 


input [31-0] 2: 
output [3:0] PGEN; 
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wire [3:0] PGEN; 


parityo #(8,0,"AUTO","1") parity_group_0 (.D(D[ 7: 0]),.PGEN(PGEN[0])); 

parityo #(8,0,"AUTO","1") parity_group_1 (.D(D[15: 8]),,PGEN(PGEN[1])); 
parityo #(8,0,"AUTO","1") parity_group_2 (.D(D[23:16]),.PGEN(PGEN[2])); 
parityo #(8,0,"“AUTO","1") parity_group_3 (.D(D[31:24]),.PGEN(PGEN[3])); 


endmodule 


oe TEST RESULTS 


Host command: verilog 
Command arguments: 
-f verilog_arguments 
-v /tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v 
prc.v 
prc_top.v 
sequencer4.v 
tarbiter.v 
{Cpu.v 
testbench.v 
tmemory.v 


VERILOG-XL 2.1.2 log file created Mar 19, 1996 11:53:03 
VERILOG-XL 2.1.2 Mar 19, 1996 11:53:03 


Copyright (c) 1994 Cadence Design Systems, Inc. All Rights Reserved. 
Unpublished -- rights reserved under the copynght laws of the United States. 


Copynght (c) 1994 UNIX Systems Laboratories, Inc. Reproduced with Permission. 


THIS SOFTWARE AND ON-LINE DOCUMENTATION CONTAIN CONFIDENTIAL INFORMATION 
AND TRADE SECRETS OF CADENCE DESIGN SYSTEMS, INC. USE, DISCLOSURE, OR 
REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF 
CADENCE DESIGN SYSTEMS, INC. 

RESTRICTED RIGHTS LEGEND 


Use, duplication, or disclosure by the Government is subject to 

restrictions as set forth in subparagraph (c)(1)(i1) of the Rights in 

Technical Data and Computer Software clause at DFARS 252.227-7013 or 
subparagraphs (c)(1) and (2) of Commercial Computer Software -- Restricted 
Rights at 48 CFR 52.227-19, as applicable. 


Cadence Design Systems, Inc. 


555 River Oaks Parkway 
San Jose, California 95134 


Les 


For technical assistance please contact the Cadence Response Center at 
1-800-CADENC2 or send email to crc_customers@cadence.com 


For more information on Cadence's Vernlog-XL product line send email to 
talkverilog@cadence.com 


Compiling source file "pre.v”’ 

Compiling source file "prc_top.v" 

Compiling source file "sequencer4.v" 

Compiling source file "tarbiter.v" 

Compiling source file "tcpu.v" 

Compiling source file "testbench.v" 

Compiling source file "tmemory.v" 

Scanning library file "/tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v" 
Scanning library file "/ump_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v" 


Waming! Implicit wire has no fanin [Verilog-IWFA] 
“pic. v= 251572 NGU 


Warning! Implicit wire has no fanin [Verilog-IWFA] 
“Dicey «Zoi 50. NC] 


Warning! Implicit wire has no fanin [Verilog-IWFA] 
‘pIC.v , 23159 NGO 


Warning! Implicit wire has no fanin [Verilog-IWFA]} 
“DIC.Y sono: NEI 

Highest leve] modules: 

testbench 


*** SDF Annotator version |.6_beta.3 

*** SDF file: /ump_mnt/h/joshua_u2/jrrobert/thesis/verilog/hardware/prc.sdf 
***  Back-annotation scope: testbench.PRC1.PRC1 

*** No configuration file specified - using default options 

*** SDF Annotator log file: sdf.log 

*** No MTM selection parameter specified 


*** No SCALE FACTORS parameter specified 
*** No SCALE TYPE parameter specified 
Configuring for back-annotation... 
Reading SDF file and back-annotating timing data... 
*** SDF back-annotation successfully completed 
PRC granted the data bus. 


(ERROR): WR and A are both unknown at time 6.700 
(ERROR): WR and A are both unknown at time 6.700 


(ERROR): WR and A are both unknown at time 6.700 

(ERROR) WR transition to unknown and (din != MEMf{a]) at time 7.000 
Instance: testbench.PRC1.PRC1.LM1.MRMA_list.hsram.inst1 

(ERROR) WR transition to unknown and (din != MEM{a]) at time 7.000 
Instance: testbench.PRC1.PRC1.DL1.data_ram1.hsram.inst1 

(ERROR) WR transition to unknown and (din != MEM{a]) at time 7.000 
Instance: testbench.PRC1.PRC1.DL1.data_ram0.hsram.inst1 


System hard reset at time oD: 

CPU started read from address OOO00000 at tume 45; 
CPU read: 0001020304050607 at 21 
CPU read: 08090a0b0cOd0e0F at 271 
CPU read: 1011121314151617 at 331 

PRC requested the bus. 

CPU read: 1819lalbicldlelf at 39] 

CPU started read from address 00000020 at time 420. 
CPU read: 2021222324252627 at 556 
CPU read: 28292a2b2c2d2e2f at 616 
CPU read: 3031323334353637 at 676 
CPU read: 38393a3b3c3d3e3f at 736 

PRC granted the data bus. 

CPU started read from address 00000180 at time 125: 
CPU read: 0001020304050607 at 1381 
CPU read: 08090a0b0cOd0e0f at 1441 
CPU read: 1011121314151617 at 1501 
CPU read: 18191lalblicidlelf at 161 

CPU started read from address 000001a0 at time 1665. 
@PUiread: 2021222324252627 at 1831 

PRC requested the bus. 

CPU read: 28292a2b2c2d2e2f at 1891 
CPU read: 3031323334353637 at ieee 
CPU read: 38393a3b3c3d3e3f at 2011 

PRC granted the data bus. 

CPU started read from address 00000040 at time 2490. 
CPU read: 4041424344454647 at 2641 
CPU read: 4041424344454647 at 2656 
CPU read: 5051525354555657 at 207 | 
CPU read: 4041424344454647 at 2686 

PRC requested the bus. 

PRC granted the data bus. 

CPU started write to address OO0001cO at time 3307. 
CrPUlwrteipeat lL; 777/77 77/771 7/77 17 at 322 
CPU write beat 2: 8888888888888888 at 3488 
CPU write beat 3: 1111111111111111 at 3548 
CPU write beat 4: 3333333333333333 at 3608 

CPU started read from address 00000060 at time B70): 
CPU read: 6061626364656667 at 3916 
CPU read: 6061626364656667 at 3931 
CPU read: 7071727374757677 at 3946 
CPU read: 6061626364656667 at 3961 

PRC requested the bus. 
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PRC granted the data bus. 


CPU started read from address OO0001c0 at time 4440. 
GPU read: 7777771177178 477 at 4606 
CPU read: 8888888888888888 at 4666 
CPU read: BITTILIIITIMI IT at 4726 
CPU read 35 3939333335553) at 4786 
L125 “testbench.v": $finish at simulation time 5035000 
4 warnings 


158647 simulation events + 266655 accelerated events + 926440 timing check events 
CPU time: 6.1 secs to compile + 161.8 secs to link + 377.5 secs in simulation 
End of VERILOG-XL 2.1.2 Mar 19, 1996 12:15:44 
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